AWS Certified Solution Architect & Developer & SysOps Administrator Associate

Links

2022 Magic Quadrant for Cloud Infrastructure and Platform Services

IAM

Users & Groups

IAM: Identity and Access Management, A Global service

Root account created by default, shouldn’t be used or shared
Users are people within your organization, and can be grouped
Groups only contain users, not other groups
Users don’t have to belong to a group, and user can belong to up to 10 groups
The user gains the permissions applied to the group through the policy

Note

IAM Users and Roles are IAM Identities, while User Groups are not and cannot be authenticated or authorized
User groups are used for collecting users with common needs and then applying IAM permissions policies to them

Policy

Policies are documents that define permissions and are written in JSON
Identity-based policies can be applied to users, groups, and roles
All permissions are implicitly denied by default
In AWS you apply the least privilege principle: don’t give more permissions than a user needs

Types of Policy

Identity-based policies
- Attached to users, groups, or roles
- Control what actions an identity can perform, on which resources, and under what conditions
Resource-based policies – attached to a resource; define permissions for a principal accessing the resource
IAM permissions boundaries – an advanced feature in which you set the maximum permissions that an identity-based policy can grant to an IAM entity
Access control lists (ACLs) – control which principals in other accounts can access the resource to which the ACL is attached
AWS Organizations service control policies (SCP) – specify the maximum permissions for an organization or OU
Session policies – used with AssumeRole API actions

🔗AWS Document

Determination Rules

By default, all requests are implicitly denied (though the root user has full access)
An explicit allow in an identity-based or resource-based policy overrides this default
If a permissions boundary, Organizations SCP, or session policy is present, it might override the allow with an implicit deny
An explicit deny in any policy overrides any allows

IAM Policies Structure

Version: policy language version, always include "2012-10-17"
Id: an identifier for the policy(optional)
Statement: one or more individual statements(required)
- Sid: an identifier for the statement(optional)
- Effect: whether the statement allows or denies access (Allow, Deny)
- Principal: account | user | role to which this policy applied to
- Action: list of actions(List, Read, Permissions Management, Write, and Tagging)
- Resource: list of resources to which the actions applied to
- Condition: conditions for when this policy is in effect (optional)

Inline vs Managed Policies

AWS Managed Policy
- Maintained by AWS
- Good for power users and administrators
- Updated in case of new services / new APIs
Customer Managed Policy
- Best Practice, re-usable, can be applied to many principals
- Version Controlled + rollback, central change management
Inline
- Strict one-to-one relationship between policy and principal
- Policy is deleted if you delete the IAM principal

An inline policy is a policy that's embedded in an IAM identity (a user, group, or role)

inline policy

IAM Security Tools

Credentials Report (account-level)
- A report that lists all your account's users and the status of their various credentials
Access Advisor (user-level)
- Access advisor shows the service permissions granted to a user and when those services were last accessed.
- You can use this information to revise your policies.

IAM Roles

Some AWS service will need to perform actions on your behalf -> To do so, we will assign permissions to AWS services with IAM Roles

An IAM role is an IAM identity that has specific permissions
Roles are assumed by users, applications, and services
Once assumed, the identity "becomes" the role and gain the roles’ permissions
Common roles:
- EC2 Instance Roles
- Lambda Function Roles
- Roles for CloudFormation

AWS STS – Security Token Service

Allows to grant limited and temporary access to AWS resources (up to 1 hour).
Credentials include:
- AccessKeyId
- Expiration
- SecretAccessKey
- SessionToken
Trust policies control who can assume the role
Temporary credentials are used with identity federation, delegation, cross-account access, and IAM roles
APIs:
- sts:AssumeRole: Assume roles within your account or cross account
- sts:AssumeRoleWithSAML: return credentials for users logged with SAML
- sts:AssumeRoleWithWebIdentity
  - return creds for users logged with an IdP (Facebook Login, Google Login, OIDC compatible…)
  - AWS recommends against using this, and using Cognito Identity Pools instead
- sts:GetSessionToken: for MFA, from a user or AWS account root user
- sts:GetFederationToken: obtain temporary creds for a federated user
- sts:GetCallerIdentity: return details about the IAM user or role used in the API call
- sts:DecodeAuthorizationMessage: decode error message when an AWS API is denied

IAM Guidelines & Best Practices

Don’t use the root account except for AWS account setup
One physical user = One AWS user
Assign users to groups and assign permissions to groups
Create a strong password policy
Use and enforce the use of Multi Factor Authentication (MFA)
Create and use Roles for giving permissions to AWS services
Use Access Keys for Programmatic Access (CLI / SDK)
Audit permissions of your account with the IAM Credentials Report
Never share IAM users & Access Keys
Never ever store IAM key credentials on any machine but a personal computer or on-premise server
On premise server best practice is to call AWS Security Token Service to obtain temporary security credentials

EC2

EC2: Elastic Compute Cloud
EC2 Instance: AMI (OS) + Instance Size (CPU + RAM) + Storage + security groups + EC2 User Data

Sizing & Configuration Options

Operating System (OS): Linux, Windows or MacOS
Compute power & Cores (CPU)
Random-access Memory (RAM)
Storage Space:
- Network-attached (EBS & EFS)
- Hardware (EC2 Instance Store)
Network Card: speed of the card, Public IP address
Firewall Rules: security group
Bootstrap Script (configure at first launch): EC2 User Data

EC2 User Data

It is possible to bootstrap instances using an EC2 User data script
Bootstrapping means launching commands when a machine starts
The script only run once at the instance first start
EC2 user data is used to automate boot tasks such as:
- Installing updates
- Installing software
- Downloading common files from the internet
- Anything you can think of
The EC2 User Data Script runs with the root user
Limited to 16 KB
Batch and PowerShell scripts can be run on Windows

EC2 Meta Data

Instance metadata is data about your EC2 instance
Instance metadata is available at http://169.254.169.254/latest/meta-data

Example

$ curl http://169.254.169.254/latest/meta-data/local-ipv4
172.31.42.238

Instance Types

🔗Homepage

General Purpose:
- Great for a diversity of workloads such as web servers or code repositories
- Balance between Compute, Memory and Networking
Compute Optimized:
- Great for compute-intensive tasks that require high performance processors
- Use cases:
  - Batch processing workloads
  - Media transcoding
  - High performance web servers
  - High performance computing (HPC)
  - Scientific modeling & machine learning
  - Dedicated gaming servers
Memory Optimized:
- Fast performance for workloads that process large data sets in memory
- Use cases:
  - High performance, relational/non-relational databases
  - Distributed web scale cache stores
  - In-memory databases optimized for BI (business intelligence)
  - Applications performing real-time processing of big unstructured data
Storage Optimized:
- Great for storage-intensive tasks that require high, sequential read and write access to large data sets on local storage
- Use cases:
  - High frequency online transaction processing (OLTP) systems
  - Relational & NoSQL databases
  - Cache for in-memory databases (for example, Redis)
  - Data warehousing applications
  - Distributed file systems

Note

Naming convention:

Exp. m5.2xlarge

m: instance class
5: generation
2xlarge: size within the instance class

Tip

Stopping and starting the instance will also move it to different underlying hardware

ECS Instance Lifecycle

instance_lifecycle

Stopping EC2 instances

EBS backed instances only
No charge for stopped instances
EBS volumes remain attached (chargeable)
Data in RAM is lost
Instance is migrated to a different host
Private IPv4 addresses and IPv6 addresses retained; Public IPv4 addresses released
Associated Elastic IPs retained

Hibernating EC2 instances

We know we can stop, terminate instances
- Stop – the data on disk (EBS) is kept intact in the next start
- Terminate – any EBS volumes (root) also set-up to be destroyed is lost
On start, the following happens:
- First start: the OS boots & the EC2 User Data script is run
- Following starts: the OS boots up
- Then your application starts, caches get warmed up, and that can take time!
Introducing EC2 Hibernate:
- The in-memory (RAM) state is preserved
- The instance boot is much faster! (the OS is not stopped / restarted)
- Under the hood: the RAM state is written to a file in the root EBS volume
- The root EBS volume must be encrypted
- Applies to on-demand or reserved Linux instances
- Contents of RAM saved to EBS volume
- Must be enabled for hibernation when launched
Use cases:
- Long-running processing
- Saving the RAM state
- Services that take time to initialize
When started (after hibernation):
- The EBS root volume is restored to its previous state
- The RAM contents are reloaded
- The processes that were previously running on the instance are resumed
- Previously attached data volumes are reattached and the instance retains its instance ID
When an EC2 instance is hibernated, the following are charged:
- EBS storage charges for in-memory data saved in EBS volumes
- Elastic IP address charges which are associated with an instance
Good to know:
- Supported Instance Families – C3, C4, C5, I3, M3, M4, R3, R4, T2, T3, ...
- Instance RAM Size – must be less than 150 GB
- Instance Size – not supported for bare metal instances
- AMI – Amazon Linux 2, Linux AMI, Ubuntu, RHEL, CentOS & Windows...
- Root Volume – must be encrypted EBS, not instance store
- Available for On-Demand, Reserved and Spot Instances
- An instance can NOT be hibernated more than 60 days

Rebooting EC2 instances

Equivalent to an OS reboot
DNS name and all IPv4 and IPv6 addresses retained
Does not affect billing

Retiring EC2 instances

Instances may be retired if AWS detects irreparable failure of the underlying hardware that hosts the instance
When an instance reaches its scheduled retirement date, it is stopped or terminated by AWS

Terminating EC2 instances

Means deleting the EC2 instance
Cannot recover a terminated instance
By default root EBS volumes are deleted

Recovering EC2 instances

CloudWatch can be used to monitor system status checks and recover instance if needed
Applies if the instance becomes impaired due to underlying hardware / platform issues
Recovered instance is identical to original instance

Security Groups

Security groups only contain allow rules
Security groups rules can reference by IP or by security group
Security groups are acting as a "firewall" on EC2 instances
They regulate:
- Type + Protocol
  - Control of inbound network (from other to the instance)
  - Control of outbound network (from the instance to other)
- Port Range
- Source: Authorised IP ranges – IPv4 and IPv6 / other security group
Can be attached to multiple instances
Locked down to a region/VPC combination
Does live "outside" the EC2 – if traffic is blocked the EC2 instance won’t see it
It’s good to maintain one separate security group for SSH access
If your application is not accessible (time out), then it’s a security group issue
If your application gives a "connection refused" error, then it’s an application error or it’s not launched
All inbound traffic is blocked by default
All outbound traffic is authorized by default

Instances Purchasing Options

On-Demand Instances

Pay for what you use:
- Amazon Linux, Windows and Ubuntu: billing per second, after the first minute
- MacOS: billing per hour
- Commercial Linux distros such as Red Hat EL and SUSE ES use hourly pricing
Highest cost but no upfront payment
No long-term commitment
Recommended for short-term and un-interrupted workloads, where you can't predict how the application will behave

Reserved Instance

You reserve a specific instance attributes (Instance Type, Region,Tenancy, OS)
Payment Options – No Upfront (+discount), Partial Upfront (++discount), All Upfront (+++discount)
Reserved Instance’s Scope – Regional or Zonal (reserve capacity in an AZ)
Recommended for steady-state usage applications (think database)
You can buy and sell in the Reserved Instance Marketplace
Standard RI - Change AZ, instance size (Linux), networking type - Use ModifyReservedInstances API
Convertible RI - Change AZ, instance size (Linux), networking type + Change family, OS, tenancy, payment option- Use ExchangeReservedInstances API

Savings Plans

Compute Savings PlanEC2 Savings Plan

1 or 3-year
hourly commitment to usage of Fargate, Lambda, and EC2
Any Region, family, size, tenancy, and OS

1 or 3-year
hourly commitment to usage of EC2 within a selected Region and Instance Family
Any size, tenancy and OS

Spot Instances

The most cost-efficient instances in AWS
2-minute warning if AWS need to reclaim capacity – available via instance metadata and CloudWatch Events
Not suitable for critical jobs or databases
Useful for workloads that are resilient to failure
- Batch jobs
- Data analysis
- Image processing
- Any distributed workloads
- Workloads with a flexible start and end time

Spot Instance Types

Spot Instance: One or more EC2 instances
Spot Fleet: launches and maintains the number of Spot / On-Demand instances to meet specified target capacity
EC2 Fleet: launches and maintains specified number of Spot / On-Demand / Reserved instances in a single API call
Spot Block: Uninterrupted for 1-6 hours; Pricing is 30% - 45% less than On-Demand

Dedicated Hosts

A physical server with EC2 instance capacity fully dedicated to your use
Allows you address compliance requirements and use your existing server-bound software licenses (per-socket, per-core, per—VM software licenses)
The most expensive option
Purchasing Options:
- On-demand – pay per second for active Dedicated Host
- Reserved - 1 or 3 years (No Upfront | Partial Upfront | All Upfront)
Useful for software that have complicated licensing model (BYOL – Bring Your Own License) Or for companies that have strong regulatory or compliance needs

Dedicated Instances

Instances run on hardware that’s dedicated to you - no other customers will share your hardware
May share hardware with other instances in same account
No control over instance placement (can move hardware after Stop / Start)

Capacity Reservations

Reserve On-Demand instances capacity in a specific AZ for any duration
You always have access to EC2 capacity when you need it
no time commitment (create/cancel anytime), no billing discounts
Combine with Regional Reserved Instances and Savings Plans to benefit from billing discounts
You’re charged at On-Demand rate whether you run instances or not
Suitable for short-term, uninterrupted workloads that needs to be in a specific AZ

Spot Fleets

Spot Fleets = set of Spot Instances + (optional) On-Demand Instances

The Spot Fleet will try to meet the target capacity with price constraints
- Define possible launch pools: instance type (Exp. m5.large), OS, Availability Zone
- Can have multiple launch pools, so that the fleet can choose
- Spot Fleet stops launching instances when reaching capacity or max cost
Strategies to allocate Spot Instances:
- lowestPrice: from the pool with the lowest price (cost optimization, short workload)
- diversified: distributed across all pools (great for availability, long workloads)
- capacityOptimized: pool with the optimal capacity for the number of instances
Spot Fleets allow us to automatically request Spot Instances with the lowest price

AMI

AMI: Amazon Machine Image

AMI are a customization of an EC2 instance
- You add your own software, configuration, operating system, monitoring...
- Faster boot / configuration time because all your software is pre-packaged
AMI are built for a specific region (and can be copied across regions)
You can launch EC2 instances from:
- Public AMIs: AWS provided, free to use, generally you just select the operating system you want
- AWS Marketplace AMIs: an AMI someone else made (and potentially sells) - generally come packaged with additional, licensed software
- Your own AMI: you make and maintain them yourself
AMI Process(from an EC2 instance)
- Start an EC2 instance and customize it
- Stop the instance (for data integrity)
- Build an AMI – this will also create EBS snapshots
- Launch instances from other AMIs
An AMI includes the following:
- One or more EBS snapshots OR for instance-store-backed AMIs - a template for the root volume of the instance
- Launch permissions that control which AWS accounts can use the AMI to launch instances
- A block device mapping that specifies the volumes to attach to the instance when it's launched

Tip

It's not possible to move an existing instance to another subnet, Availability Zone, or VPC.

Instead, you can manually migrate the instance by creating a new Amazon Machine Image (AMI) from the source instance.
Then, launch a new instance using the new AMI in the desired subnet, Availability Zone, or VPC.
Finally, you can reassign any Elastic IP addresses from the source instance to the new instance.

EC2 Image Builder

Used to automate the creation of Virtual Machines or container images
Automate the creation, maintain, validate and test EC2 AMIs
Can be run on a schedule (weekly, whenever packages are updated, etc...)
Free service (only pay for the underlying resources)

EC2 Instance Store

high-performance hardware disk

Better I/O performance
EC2 Instance Store lose their storage if they’re stopped (ephemeral)
Good for buffer / cache / scratch data / temporary content
Risk of data loss if hardware fails
Backups and Replication are your responsibility
Instance Store volumes are physically attached to the host

Placement Groups

Control over the EC2 Instance placement strategy

Documentation

EC2 Placement Groups

Cluster

clusters instances into a low-latency group in a single Availability Zone

Pros: Great network (10 Gbps bandwidth between instances with Enhanced Networking enabled - recommended)
Cons: If the rack fails, all instances fails at the same time
Use case:
- Big Data job that needs to complete fast
- Application that needs extremely low latency and high network throughput

Spread

spreads instances across underlying hardware (max 7 instances per group per AZ)

Pros:
- Can span across AvailabilityZones (AZ)
- Reduced risk is simultaneous failure
- EC2 Instances are on different physical hardware
Cons:
- Limited to 7 instances per AZ per placement group
Use case:
- Application that needs to maximize high availability
- Critical Applications where each instance must be isolated from failure from each other

Partition

spreads instances across many different partitions (which rely on different sets of racks) within an AZ. Scales to 100s of EC2 instances per group (Hadoop, Cassandra, Kafka)

Up to 7 partitions per AZ
Can span across multiple AZs
Up to 100s of EC2 instances
The instances in a partition do not share racks with the instances in the other partitions
A partition failure can affect many EC2 but won’t affect other partitions
EC2 instances get access to the partition information as metadata
Use cases: HDFS, HBase, Cassandra, Kafka

Rules and Limitations

Placement groups can't cross regions
An instance can be launched in one placement group at a time; it cannot span multiple placement groups
You can't merge placement groups
A cluster placement group can't span multiple Availability Zones
A partition placement group supports a maximum of 7 partitions per Availability Zone
A rack spread placement group supports a maximum of 7 running instances per Availability Zone

Network Interfaces (ENI, ENA, EFA)

ENI - Elastic Network Interface

Logical component in a VPC that represents a virtual network card

The ENI can have the following attributes:
- Primary private IPv4, one or more secondary IPv4
- One Elastic IP (IPv4) per private IPv4
- One Public IPv4(optional)
- One or more security groups
- A MAC address
You can create ENI independently and attach them on the fly (move them) on EC2 instances for failover
Additional ENIs can be attached from subnets within the same AZ
The primary network interface has a private IP and optionally a public IP
You cannot attach ENIs from subnets in different AZs
Bound to a specific availability zone (AZ)
Can use with all instance types

ENA - Elastic Network Adapter

Enhanced networking performance
Higher bandwidth and lower inter-instance latency
Must choose supported instance type

EFA - Elastic Fabric Adapter

Use with High Performance Computing and MPI and ML use cases
Tightly coupled applications
Can use with all instance types

Public, Private and Elastic IP addresses

Public IP Address:
- Lost when the instance is stopped
- Used in Public Subnets
- No charge
- Associated with a private IP address on the instance
- Cannot be moved between instances
Private IP Address:
- Retained when the instance is stopped
- Used in Public and Private Subnets
Elastic IP Address:
- Static Public IP address
- You are charged if not used
- Associated with a private IP address on the instance
- Can be moved between instances and Elastic Network Adapters

AWS Nitro System

Nitro is the underlying platform for the next generation of EC2 instances

Support for many virtualized and bare metal instance types
Breaks functions into specialized hardware with a Nitro Hypervisor
Specialized hardware includes:
- Nitro cards for VPC
- Nitro cards for EBS
- Nitro for Instance Storage
- Nitro card controller
- Nitro security chip
- Nitro hypervisor
- Nitro Enclaves
Improves performance, security and innovation:
- Performance close to bare metal for virtualized instances
- Elastic Network Adapter and Elastic Fabric Adapter
- More bare metal instance types
- Higher network performance (e.g. 100 Gbps)
- High Performance Computing (HPC) optimizations
- Dense storage instances (e.g. 60 TB)

AWS Nitro Enclaves

Isolated compute environments
Runs on isolated and hardened virtual machines
No persistent storage, interactive access, or external networking
Uses cryptographic attestation to ensure only authorized code is running
Integrates with AWS Key Management Service (KMS)
Protect and securely process highly sensitive data:
- Personally identifiable information (PII)
- Healthcare data
- Financial data
- Intellectual Property data

EBS

An EBS (Elastic Block Store) Volume is a network drive you can attach to your instances while they run

It’s a network drive (i.e. not a physical drive)
- It uses the network to communicate the instance, which means there might be a bit of latency
- It can be detached from an EC2 instance and attached to another one quickly
It’s locked to an Availability Zone (AZ)
- To move a volume across, you first need to snapshot it
Have a provisioned capacity (size in GBs, and IOPS)
- You get billed for all the provisioned capacity
- You can increase the capacity of the drive over time
Delete on Termination attribute:
- Controls the EBS behaviour when an EC2 instance terminates
- By default, the root EBS volume is deleted (attribute enabled)
- By default, any other attached EBS volume is not deleted (attribute disabled)
- This can be controlled by the AWS console / AWS CLI

Resizing EBS Volumes

After you increase the size of an EBS volume, use the Windows Disk Management utility or PowerShell to extend the disk size to the new size of the volume.
You can begin resizing the file system as soon as the volume enters the optimizing state.

🔗Official Reference

EBS Snapshots

Documentation

Make a backup (snapshot) of your EBS volume at a point in time
Not necessary to detach volume to do snapshot, but recommended
Can copy snapshots across AZ or Region
EBS Snapshot Archive:
- Move a snapshot to an "archive tier" which is cheaper
- Takes within 24 to 72 hours for restoring the archive
EBS Snapshots Features:
- EBS Snapshot Archive
  - Move a Snapshot to an 'archive tier' that is much cheaper
  - Takes within 24 to 72 hours for restoring the archive
- Recycle Bin for EBS Snapshots
  - Setup rules to retain deleted snapshots so you can recover them after an accidental deletion
  - Specify retention(from 1 day to 1 year)
- Fast Snapshot Restore(FSR)
  - Force full initialization of snapshot to have no latency on the first use($$$)
  - Fast Snapshot restore (FSR) needs to be enabled per Availability Zone
  - This can be enabled on both existing as well as new Snapshots

Tip

Amazon EBS direct APIs can be used to create EBS snapshots, write data directly to snapshots, read data from snapshots, and identify the difference between two snapshots.

Note

There can be an initial performance hit when an Amazon EBS volume is created from snapshots. This can be avoided by either of the following:

Force the immediate initialization of the entire volume.
Enable fast snapshot to restore on a snapshot to ensure that the EBS volumes are fully initialized.

EBS Volume Types

EBS Volumes come in 6 types
- gp2 / gp3 (SSD): General purpose SSD volume that balances price and performance for a wide variety of workloads
- io1 / io2 (SSD): Highest-performance SSD volume for mission-critical low-latency or high-throughput workloads
- st1 (HDD): Low cost HDD volume designed for frequently accessed, throughput-intensive workloads
- sc1 (HDD): Lowest cost HDD volume designed for less frequently accessed workloads
EBS Volumes are characterized in Size | Throughput | IOPS (I/O Ops Per Sec)
Only gp2/gp3 and io1/io2 can be used as boot volumes

EBS Volume Types Usage

EBS vs Instance Store

EBS volumes are attached over the network, while Instance Store volumes are physically attached to the host and offer high - performance
EBS volumes offer persistent storage, while Instance Store volumes are ephemeral(non-persistent)

EBS Multi-Attach – io1/io2 family

Attach the same EBS volume to multiple EC2 instances in the same AZ
Available for Nitro system-based EC2 instances
Up to 16 instances can be attached to a single volume
Each instance has full read & write permissions to the volume
Use case:
- Achieve higher application availability in clustered Linux applications
- Applications must manage concurrent write operations
Must use a file system that’s cluster-aware (not XFS, EX4, etc...)

Encrypted Snapshot -> Encrypted AMI
- Can be shared with other accounts(custom key only)
- Cannot be shared publicly
Encrypted AMI -> Encrypted AMI
- Can change encryption key
- Can change region
Encrypted AMI -> EC2 Instance
- Can change encryption key
- Can change AZ
Unencrypted AMI -> EC2 Instance
- Can change encryption state
- Can change AZ
Encrypted Snapshot -> Encrypted Volume
- Can be encrypted
- Can change AZ

Amazon Data Lifecycle Manager(DLM)

DLM automates the creation, retention, and deletion of EBS snapshots and EBS-backed AMIs
DLM helps with the following:
- Protects valuable data by enforcing a regular backup schedule
- Create standardized AMIs that can be refreshed at regular intervals
- Retain backups as required by auditors or internal compliance
- Reduce storage costs by deleting outdated backups
- Create disaster recovery backup policies that back up data to isolated accounts

EBS Encryption

When you create an encrypted EBS volume, you get the following:
- Data at rest is encrypted inside the volume
- All the data in flight moving between the instance and the volume is encrypted
- All snapshots are encrypted
- All volumes created from the snapshot
Encryption and decryption are handled transparently (you have nothing to do)
Encryption has a minimal impact on latency
EBS Encryption leverages keys from KMS (AES-256)
Copying an unencrypted snapshot allows encryption
Snapshots of encrypted volumes are encrypted

Using RAID with EBS

RAID stands for Redundant Array of Independent disks

Not provided by AWS, you must configure through your operating system
RAID 0 and RAID 1 are potential options on EBS
- RAID 0 is used for striping data across disks (performance)
- RAID 1 is used for mirroring data across disks (redundancy / fault tolerance)
RAID 5 and RAID 6 are not recommended by AWS

EFS

EFS: Elastic File System

Managed NFS (network file system) that can be simultaneously mounted on thousands of EC2
Appliances are known as Network Attached Storage (NAS) devices for both SMB / CIFS and NFS
Can connect instances from other VPCs and on-premises computers
EFS works with Linux EC2 instances in multi-AZ - POSIX file system that has a standard file API
Encryption at rest using KMS can be enabled when creating the file system
Encryption during transit can be enabled when mounting the file system using the Amazon EFS mount helper. The mount helper uses TLS version 1.2 to communicate with the file system.
Highly available, scalable, expensive (3x gp2), pay per use, no capacity planning
EFS Infrequent Access (EFS-IA): Storage class that is cost-optimized for files not accessed every day
Use cases:
- content management
- web serving
- data sharing
- Wordpress

Tip

Once you create an EFS file system, you cannot change its encryption setting.
This means that you cannot modify an unencrypted file system to make it encrypted. Instead, you need to create a new, encrypted file system.

Tip

Mount command: mount –t nfs servername:folderpath /mountpoint

EFS Performance

EFS Scale
- 1000s of concurrent NFS clients, 10 GB+/s throughput
- Grow to Petabyte-scale network file system, automatically
Performance mode (set at EFS creation time)
- General purpose(default): latency-sensitive use cases(webserver, CMS, etc...)
- Max I/O – higher latency, throughput, highly parallel (big data, media processing, etc...)
Throughput mode
- Bursting (1TB = 50MiB/s + burst of up to 100MiB/s)
- Provisioned:set your throughput regardless of storage size,ex: 1GiB/s for 1TB storage

EFS Storage Classes

Storage Tiers (lifecycle management feature – move file after N days)
- Standard: for frequently accessed files
- Infrequent access (EFS-IA): cost to retrieve files, lower price to store. Enable EFS-IA with a Lifecycle Policy
Availability and durability
- Standard: Multi-AZ, great for prod
- One Zone: One-AZ, great for dev, backup enabled by default, compatible with IA (EFS One Zone-IA)

Tip

Amazon EFS Standard-IA storage class can be used to store data that is infrequently accessed but requires high availability and durability.
With Amazon EFS Standard-IA storage class, data is stored redundantly across multiple AZ.

Mount Targets

A mount target provides an IP address for an NFSv4 endpoint at which you can mount an Amazon EFS file system.
You mount your file system using its Domain Name Service (DNS) name, which resolves to the IP address of the EFS mount target in the same Availability Zone as your EC2 instance.
You can create one mount target in each Availability Zone in an AWS Region
If there are multiple subnets in an Availability Zone in your VPC, you create a mount target in one of the subnets. Then all EC2 instances in that Availability Zone share that mount target.

Amazon FSx

Launch fully managed 3rd party high-performance file systems on AWS
Fully managed service
Amazon FSx for Windows File Server: A fully managed, highly reliable, and scalable native Windows shared file system
- Built on Windows File Server
- Full supports SMB protocol & Windows NTFS
- Integrated with Microsoft Active Directory
- Can be accessed from AWS or your on-premise infrastructure
- Supports Windows-native file system features:
  - Access Control Lists (ACLs), shadow copies, and user quotas
  - NTFS file systems that can be accessed from up to thousands of compute instances using the SMB protocol
- High availability: replicates data within an Availability Zone (AZ)
- Multi-AZ: file systems include an active and standby file server in separate AZs
Amazon FSx for Lustre: A fully managed, high-performance, scalable file storage for High Performance Computing (HPC)
- High-performance file system optimized for fast processing of workloads such as:
  - Machine learning
  - High performance computing (HPC)
  - Video processing
  - Financial modeling
  - Electronic design automation (EDA)
- Works natively with S3, letting you transparently access your S3 objects as files
- Your S3 objects are presented as files in your file system, and you can write your results back to S3
- Provides a POSIX-compliant file system interface

Tip

The name Lustre is derived from "Linux" and "cluster"

Note

With Multi-AZ deployment, active and standby servers are placed in separate AZ. Data written to an active server is synchronously replicated to standby servers.
With Synchronous replication, data is written to active and standby servers simultaneously, while with asynchronous replication, there might be a lag between data written to active servers and standby servers.
Synchronous Replication is advantageous during failover where standby servers are in sync with active servers.

ELB & ASG

ELB: Elastic Load Balancing
ASG: Auto Scaling Group

High Availability & Scalability

Vertical Scaling: Increase instance size (scale up / down)
Horizontal Scaling: Increase number of instances (scale out / in)
High Availability:
- Usually goes hand in hand with horizontal scaling
- Run instances for the same application across multi AZ
  - Auto Scaling Group multi AZ
  - Load Balancer multi AZ
- The goal of high availability is to survive a data center loss
- The high availability can be passive(exp. RDS Multi AZ)
- The high availability can be active(for horizontal scaling)

Scalability vs Elasticity vs Agility

Scalability: ability to accommodate a larger load by making the hardware stronger (scale up), or by adding nodes (scale out)
Elasticity: once a system is scalable, elasticity means that there will be some "auto-scaling" so that the system can scale based on the load.This is "cloud-friendly": pay-per-use, match demand, optimize costs
Agility: new IT resources are only a click away, which means that you reduce the time to make those resources available to your developers from weeks to just minutes.

Elastic Load Balancer

An ELB (Elastic Load Balancer) is a managed load balancer
- AWS guarantees that it will be working
- AWS takes care of upgrades, maintenance, high availability
- AWS provides only a few configuration knobs
It costs less to setup your own load balancer but it will be a lot more effort on your end (maintenance, integrations)

Application Load Balancer

Operates at the request level
Layer 7(HTTP / HTTPS / gRPC) only
Load balancing to multiple applications across machines (target groups)
Load balancing to multiple applications on the same machine (ex: containers)
Support for HTTP/2 and WebSocket
Support redirects (from HTTP to HTTPS for example)
Routing tables to different target groups:
- Routing based on path in URL (example.com/users & example.com/posts)
- Routing based on hostname in URL (one.example.com & other.example.com)
- Routing based on Query String, Headers (example.com/users ?id=123&order=false)
- Routing based on source IP
a great fit for micro services & container-based application (example: Docker & Amazon ECS)
Has a port mapping feature to redirect to a dynamic port in ECS
Target Groups:
- EC2 instances (can be managed by an Auto Scaling Group) – HTTP
- ECS tasks (managed by ECS itself) – HTTP
- Lambda functions – HTTP request is translated into a JSON event
- IP Addresses – must be private IPs
Fixed hostname (XXX.<region>.elb.amazonaws.com)
The application servers don’t see the IP of the client directly
- The true IP of the client is inserted in the header X-Forwarded-For
- We can also get Port (X-Forwarded-Port) and Proto (X-Forwarded-Proto)
Use Cases:
- Web applications with L7 routing (HTTP/HTTPS)
- Microservices architectures (e.g. Docker containers)
- Lambda targets

Network Load Balancer

Operates at the connection level
Ultra-high performance, allows for TCP(Layer 4):
- Forward TCP & UDP traffic to your instances
- Handle millions of request per seconds
- Less latency(~100 ms)
Requests are routed based on IP protocol data
NLB has one static IP per AZ, and supports assigning Elastic IP (helpful for whitelisting specific IP)
NLB nodes can have elastic IPs in each subnet
NLB are used for extreme performance, TCP or UDP traffic
NLBs listen on TCP, TLS, UDP or TCP_UDP
A separate listener on a unique port is required for routing
Not included in the AWS free tier
Health Checks support the TCP, HTTP and HTTPS Protocols
Target Groups
- EC2 instances
- IP Addresses – must be private IPs
- Application Load Balancer
- On-Premises
Use Cases:
- TCP and UDP based applications
- Ultra-low latency
- Static IP addresses
- VPC endpoint services

Gateway Load Balancer

Used in front of virtual appliances
Example: Firewalls, Intrusion Detection and Prevention Systems, Deep Packet Inspection Systems, payload manipulation, ...
Operates at Layer 3 (Network Layer) – listens for all packets on all ports
Forwards traffic to the TG specified in the listener rules
Exchanges traffic with appliances using the GENEVE protocol on port 6081
Deploy, scale, and manage a fleet of 3rd party network virtual appliances in AWS
Combines the following functions:
- Transparent Network Gateway – single entry/exit for all traffic
- Load Balancer – distributes traffic to your virtual appliances
Target Groups
- EC2 instances
- IP Addresses – must be private IPs
Use Case:
- Load balance virtual appliances such as:
  - Intrusion detection systems (IDS)
  - Intrusion prevention systems (IPS)
  - Next generation firewalls (NGFW)
  - Web application firewalls (WAF)
  - Distributed denial of service protection systems (DDoS)
- Integrate with Auto Scaling groups for elasticity
- Apply network monitoring and logging for analytics

Load Balancer - SSL Certificates

Note

An SSL Certificate allows traffic between your clients and your load balancer to be encrypted in transit (in-flight encryption)
SSL refers to Secure Sockets Layer, used to encrypt connections
TLS refers to Transport Layer Security, which is a newer version
Nowadays, TLS certificates are mainly used, but people still refer as SSL
Public SSL certificates are issued by Certificate Authorities (CA)
- Comodo, Symantec, GoDaddy, GlobalSign, Digicert, Letsencrypt, etc...
SSL certificates have an expiration date (you set) and must be renewed

The load balancer uses an X.509 certificate (SSL/TLS server certificate)
You can manage certificates using ACM (AWS Certificate Manager)
You can create upload your own certificates alternatively
HTTPS listener:
- You must specify a default certificate
- You can add an optional list of certs to support multiple domains
- Clients can use SNI (Server Name Indication) to specify the hostname they reach
- Ability to specify a security policy to support older versions of SSL /TLS (legacy clients)

SSL – Server Name Indication (SNI)

SNI solves the problem of loading multiple SSL certificates onto one web server (to serve multiple websites)

It’s a "newer" protocol, and requires the client to indicate the hostname of the target server in the initial SSL handshake
The server will then find the correct certificate, or return the default one
Note:
- Only works for ALB & NLB (newer generation), CloudFront
- Does not work for CLB (older gen)

Cross-Zone Load Balancing

Application Load Balancer
- Always on (can’t be disabled)
- No charges for inter-AZ data
Network Load Balancer
- Disabled by default
- You pay charges ($) for inter-AZ data if enabled

Connection Draining

Time to complete "in-flight requests" while the instance is de-registering or unhealthy

Stops sending new requests to the EC2 instance which is de-registering
Between 1 to 3600 seconds (default: 300 seconds)
Can be disabled (set value to 0)
Set to a low value if your requests are short

ELB Access Logs

ELB access logs are an optional feature that can be used to troubleshoot traffic patterns and issues with traffic as it hits the ELB.

ELB access logs capture details of requests sent to your load balancer such as:
- The time of the request
- The client IP
- Latency
- Server responses
Access logs are stored in an S3 bucket. Log files are published every five minutes, and multiple logs can be published for the same five-minute period.
ELB access logs also include HTTP response codes from the target.
Disabled by default

Note

The S3 bucket must be in the same region as the ELB.
The bucket policy must be configured to allow access logs to write to the bucket.
You can use tools such as Amazon Athena, Loggly, Splunk, or Sumo Logic to analyze the contents of ELB access logs.

Auto Scaling Group

In real-life, the load on your websites and application can change
In the cloud, you can create and get rid of servers very quickly
The goal of an Auto Scaling Group (ASG) is to:
- Scale out (add EC2 instances) to match an increased load
- Scale in (remove EC2 instances) to match a decreased load
- Ensure we have a minimum and a maximum number of machines running
- Automatically register new instances to a load balancer
- Replace unhealthy instances
Cost Savings: only run at an optimal capacity (principle of the cloud)

Note

Automatically scale AWS services including:

Amazon EC2 – Launch or terminate EC2 instances
Amazon EC2 Spot Fleets
Amazon ECS – Adjust ECS service desired count up/down
Amazon DynamoDB – increase provisioned RCUs/WCUs
Amazon Aurora – Adjust the number of Read Replicas

Tip

ASG is region bound, you can’t span them across regions

Configuration of an Auto Scaling Group

A Launch Template specifies the EC2 instance configuration:
- AMI + InstanceType
- EC2 User Data
- EBS Volumes
- Security Groups
- SSH Key Pair
- IAM Roles for your EC2 Instances
- Network + Subnets Information
- Load Balancer Information
Configure purchase options – On-demand vs Spot
Configure VPC and Subnets
Attach Load Balancer
Configure health checks for EC2 & ELB
Group size and scaling policies

Monitoring

It is possible to scale an ASG based on CloudWatch alarms
An alarm monitors a metric (such as Average CPU, or a custom metric)
Metrics such as Average CPU are computed for the overall ASG instances

Note

Group metrics (ASG)
- Data points about the Auto Scaling group
- 1-minute granularity
- No charge
- Must be enabled
Basic monitoring (Instances)
- 5-minute granularity
- No Charge
Detailed monitoring (Instances)
- 1-minute granularity
- Charges apply

Scaling Strategies

Manual Scaling: Update the size of an ASG manually
Dynamic Scaling: Respond to changing demand
- Target Tracking Scaling
  - Most simple and easy to set-up
  - Example: the average ASG CPU to stay at around 40%
- Simple / Step Scaling
  - When a CloudWatch alarm is triggered (example CPU > 70%), then add 2 units
  - When a CloudWatch alarm is triggered (example CPU < 30%), then remove 1 unit
- Scheduled Scaling
  - Anticipate a scaling based on known usage patterns
  - Example: increase the min capacity to 10 at 5 pm on Fridays
Predictive Scaling: Uses Machine Learning to predict future traffic ahead of time

Tip

Target tracking is only for unpredictable workloads

Additional Scaling Settings

Cooldowns – Used with simple scaling policy to prevent Auto Scaling from launching or terminating before effects of previous activities are visible. Default value is 300 seconds (5 minutes)
Termination Policy – Controls which instances to terminate first when a scale-in event occurs
Termination Protection – Prevents Auto Scaling from terminating protected instances
Standby State – Used to put an instance in the InService state into the Standby state, update or troubleshoot the instance
Lifecycle Hooks – Used to perform custom actions by pausing instances as the ASG launches or terminates them

ASG Lifecycle

Global Applications

Route 53

DNS Terminologies

Domain Registrar: Amazon Route 53, GoDaddy, ...
DNS Records: A, AAAA, CNAME, NS, ...
Zone File: contains DNS records
Name Server: resolves DNS queries (Authoritative or Non-Authoritative)
Top Level Domain (TLD): .com, .us, .in, .gov, .org, ...
Second Level Domain (SLD): amazon.com, google.com, ...

URL

Amazon Route 53

A highly available, scalable, fully managed and Authoritative DNS
- Authoritative = the customer (you) can update the DNS records
Route 53 is also a Domain Registrar
Ability to check the health of your resources
The only AWS service which provides 100% availability SLA
53 is a reference to the traditional DNS port
Features:
- Domain Registration
- Hosted Zone
- Health Checks
- Traffic Flow

Route 53 – Records

How you want to route traffic for a domain
Each record contains:
- Domain/subdomain Name
- Record Type
- Value
- Routing Policy – how Route 53 responds to queries
- TTL – amount of time the record cached at DNS Resolvers
Route 53 supports the following DNS record types:
- (must know)A /AAAA / CNAME / NS
  - A – maps a hostname to IPv4
  - AAAA – maps a hostname to IPv6
  - CNAME – maps a hostname to another hostname
    - The target is a domain name which must have an A or AAAA record
    - Can’t create a CNAME record for the top node of a DNS namespace(Zone Apex) -> use Alias instead
    - Example: you can’t create for example.com, but you can create for <www.example.com>
  - NS – Name Servers for the Hosted Zone
    - Control how traffic is routed for a domain
- (advanced)CAA/DS/MX/NAPTR/PTR/SOA/TXT/SPF/SRV

DNS Record Types

Route 53 – Hosted Zones

A container for records that define how to route traffic to a domain and its subdomains
Public Hosted Zones
- contains records that specify how to route traffic on the Internet (public domain names)
- Exp. application1.mypublicdomain.com
Private Hosted Zones
- contain records that specify how you route traffic within one or more VPCs (private domain names)
- Exp. application1.company.internal
You pay $0.50 per month per hosted zone

Note

When you create a hosted zone, Route 53 automatically creates a name server (NS) record and a start of authority (SOA) record for the zone.

Migration to/from Route 53

You can migrate from another DNS provider and can import records
You can migrate a hosted zone to another AWS account
You can migrate from Route 53 to another registrar
You can also associate a Route 53 hosted zone with a VPC in another account
- Authorize association with VPC in the second account
- Create an association in the second account

Records TTL(Time To Live)

High TTL – e.g., 24 hr
- Less traffic on Route 53
- Possibly outdated records
Low TTL – e.g., 60 sec.
- More traffic on Route 53 ($$)
- Records are outdated for less time
- Easy to change records

Note

Except for Alias records,TTL is mandatory for each DNS record

CNAME vs Alias

CNAMEAlias

Points a hostname to any other hostname (acme.example.com to zenith.example.com or to acme.example.org)
Can redirect DNS queries to any DNS record
ONLY FOR NON ROOT DOMAIN (aka. something.mydomain.com)

Points a hostname to an AWS Resource (app.mydomain.com => blabla.amazonaws.com)
Works for ROOT DOMAIN and NON ROOT DOMAIN (aka mydomain.com)
An extension to DNS functionality
Automatically recognizes changes in the resource's IP addresses
Alias Record is always of type A/AAAA for AWS resources
You can't set the TTL
Free of charge
Native health check
Records Targets:
- Elastic Load Balancers
- CloudFront Distributions
- API Gateway
- Elastic Beanstalk environments
- S3 Websites
- VPC Interface Endpoints
- Global Accelerator accelerator
- Route 53 record in the same hosted zone
You can't set an ALIAS record for an EC2 DNS name

Route 53 – Routing Policies

Note

Define how Route 53 responds to DNS queries
Routing:
- It’s not the same as Load balancer routing which routes the traffic
- DNS does not route any traffic, it only responds to the DNS queries

Route 53 Supports the following Routing Policies:

SimpleWeightedFailover(Active-Passive)Latency basedGeolocationMulti-Value AnswerGeoproximity

Simple DNS response providing the IP address associated with a name
Can specify multiple values in the same record
If multiple values are returned, a random one is chosen by the client
When Alias enabled, specify only one AWS resource
Can’t be associated with Health Checks

Uses the relative weights(actually use an integer between 0 and 255) assigned to resources to determine which to route to
Assign each record a relative weight(weights don't need to sum up to 100)
DNS records must have the same name and type
Can be associated with Health Checks
Use cases: load balancing between regions, testing new application versions...
Assign a weight of 0 to a record to stop sending traffic to a resource
If all records have weight of 0, then all records will be returned equally

If primary is down (based on health checks), routes to secondary destination
Health check is required on Primary

Redirect to the resource that has the least latency close to us
Super helpful when latency for users is a priority
Latency is based on traffic between users and AWS Regions
Japan users may be directed to the US (if that’s the lowest latency)
Can be associated with Health Checks (has a failover capability)

Different from Latency-based!
This routing is based on user geographic location
Specify location by Continent, Country or by US State (if there’s overlapping, most precise location selected)
Should create a "Default" record (in case there’s no match on location)
Use cases: website localization, restrict content distribution, load balancing, ...
Can be associated with Health Checks
You can use geolocation routing to create records in a private hosted zone

Use when routing traffic to multiple resources
Returns several IP addresses and functions as a basic load balancer
Can be associated with Health Checks (return only values for healthy resources)
Up to 8 healthy records are returned for each Multi-Value query
Multi-Value is not a substitute for having an ELB

Using Route 53 Traffic Flow feature
Route traffic to your resources based on the geographic location of users and resources
Ability to shift more traffic to resources based on the defined bias
You must use Route 53 Traffic Flow to use this feature

Route 53 – Calculated Health Checks

Combine the results of multiple Health Checks into a single Health Check
You can use OR, AND, or NOT
Can monitor up to 256 Child Health Checks
Specify how many of the health checks need to pass to make the parent pass
Usage: perform maintenance to your website without causing all health checks to fail

AWS CloudFront

Content Delivery Network (CDN):

Improves read performance, content is cached at the edge
Improves users experience
450+ Point of Presence globally (edge locations)
DDoS protection, integration with Shield, AWS Web Application Firewall

CloudFront Origin Access Control (OAC)

Like a Signed OAI but supports additional use cases
AWS recommend the OAC for the following use cases:
- Amazon S3 server-side encryption with AWS KMS (SSE-KMS)
- All Amazon S3 buckets in all AWS Regions
- Dynamic requests (PUT and DELETE) to Amazon S3
Requires an S3 bucket policy that allows the CloudFront service principal

Origins

S3 bucketCustom Origin (HTTP)

For distributing files and caching them at the edge
Enhanced security with CloudFront Origin Access Control (OAC)
OAC is replacing Origin Access Identity(OAI)
CloudFront can be used as an ingress (to upload files to S3)

Application Load Balancer
EC2 instance
S3 website (must first enable the bucket as a static S3 website)
Any HTTP backend you want

Tip

The uses cases for origin custom headers are:

Identifying requests from CloudFront
Determining which requests come from a particular distribution
Enabling cross-origin resource sharing (CORS)
Controlling access to content

If the header names and values that you specify are not already present in the viewer request, CloudFront adds them to the origin request.
If a header is present, CloudFront overwrites the header value before forwarding the request to the origin.

Geo Restriction

You can restrict who can access your distribution
- Whitelist: Allow your users to access your content only if they're in one of the countries on a list of approved countries.
- Blacklist: Prevent your users from accessing your content if they're in one of the countries on a blacklist of banned countries.
The "country" is determined using a 3rd party Geo-IP database
Use case: Copyright Laws to control access to content

CloudFront vs S3 Cross Region Replication

CloudFrontS3 Cross Region Replication

Global Edge network
Files are cached for a TTL (maybe a day)
Great for static content that must be available everywhere

Must be setup for each region you want replication to happen
Files are updated in near real-time
Read only
Great for dynamic content that needs to be available at low-latency in few regions

Price Classes

You can reduce the number of edge locations for cost reduction
Three price classes:
1. Price Class All: all regions – best performance
2. Price Class 200: most regions, but excludes the most expensive regions
3. Price Class 100: only the least expensive regions

CloudFront Caching

Cache based on
- Headers
- Session Cookies
- Query String Parameters
The cache lives at each CloudFront Edge Location
You want to maximize the cache hit rate to minimize requests on the origin
Control the TTL (0 seconds to 1 year), can be set by the origin using the Cache-Control header, Expires header...
You can invalidate part of the cache using the CreateInvalidation API

Tip

You can define a maximum TTL and a default TTL
TTL is defined at the behavior level -> This can be used to define different TTLs for different file types
After expiration, CloudFront checks the origin for any new requests (check if the file is the latest version)
Headers can be used to control the cache:
- Cache-Control max-age=(seconds) - specify how long before CloudFront gets the object again from the origin server
- Expires – specify an expiration date and time
When the TTL on a file expires, CloudFront forwards the next incoming request to the origin server. If CloudFront has the latest version, the origin returns the status code 304 Not Modified.

CloudFront Signed URL vs S3 Pre-Signed URL

CloudFront Signed URLS3 Pre-Signed URL

Allow access to a path, no matter the origin
Account wide key-pair, only the root can manage it
Can specify beginning and expiration date and time, IP addresses/ranges of users
Can leverage caching features

Issue a request as the person who pre-signed the URL
Uses the IAM key of the signing IAM principal
Limited lifetime

Note

Signed URLs should be used for individual files and clients that don’t support cookies

CloudFront Signed Cookies

Similar to Signed URLs
Use signed cookies when you don’t want to change URLs
Can also be used when you want to provide access to multiple restricted files (Signed URLs are for individual files)

CloudFront Access Logs

Contain detailed information about every user request that CloudFront receives at Edge Locations
Known as standard logs and stored in S3
Can log separately for different distributions
Can also enable real-time logs which are recorded in real time (within seconds)
Real-time logs can be used to monitor, analyze, and take action based on content delivery performance
Edge function logs record requests processed by Lambda@Edge and CloudFront Functions

Lambda@Edge

Lambda functions written in NodeJS or Python
Scales to 1000s of requests/second
Used to change CloudFront requests and responses:
- Viewer Request – after CloudFront receives a request from a viewer
- Origin Request – before CloudFront forwards the request to the origin
- Origin Response – after CloudFront receives the response from the origin
- Viewer Response – before CloudFront forwards the response to the viewer

Lambda@Edge

Use Cases:
- Website Security and Privacy
- Dynamic Web Application at the Edge
- Search Engine Optimization (SEO)
- Intelligently Route Across Origins and Data Centers
- Bot Mitigation at the Edge
- Real-time Image Transformation
- A/B Testing
- User Authentication and Authorization
- User Prioritization
- User Tracking and Analytics

S3 Transfer Acceleration

Increase transfer speed by transferring file to an AWS edge location which will forward the data to the S3 bucket in the target region

AWS Global Accelerator

Improve global application availability and performance using the AWS global network
Leverage the AWS internal network to optimize the route to your application (60% improvement)
2 Anycast IP are created for your application and traffic is sent through Edge Locations
The Edge locations send the traffic to your application

Tip

Unicast IP: one server holds one IP address
Anycast IP: all servers hold the same IP address and the client is routed to the nearest one

Works with Elastic IP, EC2 instances, ALB, NLB, public or private
Consistent Performance
- Intelligent routing to lowest latency and fast regional failover
- No issue with client cache (because the IP doesn’t change)
- Internal AWS network
Health Checks
- Global Accelerator performs a health check of your applications
- Helps make your application global (failover less than 1 minute for unhealthy)
- Great for disaster recovery (thanks to the health checks)
Security
- only 2 external IP need to be whitelisted
- DDoS protection thanks to AWS Shield

AWS Global Accelerator vs CloudFront

They both use the AWS global network and its edge locations around the world
Both services integrate with AWS Shield for DDoS protection.

CloudFront – Content Delivery NetworkGlobal Accelerator

Improves performance for your cacheable content (such as images and videos)
Dynamic content(such as API acceleration and dynamic site delivery)
Content is served at the edge

Improves performance for a wide range of applications over TCP or UDP
No caching, proxying packets at the edge to applications running in one or more AWS Regions
Good fit for non-HTTP use cases, such as gaming(UDP), IoT(MQTT),or Voice over IP
Good for HTTP use cases that require static IP addresses, deterministic, fast regional failover

AWS Outposts

Hybrid Cloud: businesses that keep an on-premises infrastructure alongside a cloud infrastructure
Therefore, two ways of dealing with IT systems:
- One for the AWS cloud (using the AWS console, CLI, and AWS APIs)
- One for their on-premises infrastructure
AWS Outposts are "server racks" that offers the same AWS infrastructure, services, APIs & tools to build your own applications on-premises just as in the cloud
AWS will setup and manage "Outposts Racks" within your on-premises infrastructure and you can start leveraging AWS services on-premises
You are responsible for the Outposts Rack physical security
Benefits:
- Low-latency access to on-premises systems
- Local data processing
- Data residency
- Easier migration from on-premises to the cloud
- Fully managed service

AWS WaveLength

WaveLength Zones are infrastructure deployments embedded within the telecommunications providers’ datacenters at the edge of the 5G networks

Brings AWS services to the edge of the 5G networks (Example:EC2,EBS,VPC...)
Ultra-low latency applications through 5G networks
Traffic doesn’t leave the Communication Service Provider’s (CSP) network
High-bandwidth and secure connection to the parent AWS Region
No additional charges or service agreements
Use cases:
- Smart Cities
- ML-assisted diagnostics
- Connected Vehicles
- Interactive Live Video Streams
- AR/VR
- Real-time Gaming
- ...

AWS Local Zones

Places AWS compute, storage, database, and other selected AWS services closer to end users to run latency-sensitive applications
Extend your VPC to more locations – "Extension of an AWS Region"
Compatible with EC2, RDS, ECS, EBS, ElastiCache, Direct Connect ...

VPC

A VPC is a logically isolated, software-defined portion of the AWS cloud within a region

Note

CIDR block size can be between /16 and /28
The CIDR block must not overlap with any existing CIDR block that's associated with the VPC
You cannot increase or decrease the size of an existing CIDR block, but you can add a secondary CIDR block to an existing VPC

Tip

CIDR stands for Classless Inter-Domain Routing

Subnet(IPv4)

AWS reserves 5 IP addresses (first 4 & last 1) in each subnet
These 5 IP addresses are not available for use and can’t be assigned to an EC2 instance

Example

if CIDR block 10.0.0.0/24, then reserved IP addresses are:

10.0.0.0 – Network Address
10.0.0.1 – reserved by AWS for the VPC router
10.0.0.2 – reserved by AWS for mapping to Amazon-provided DNS
10.0.0.3 – reserved by AWS for future use
10.0.0.255 – Network Broadcast Address. AWS does not support broadcast in a VPC, therefore the address is reserved

VPC & Subnets Primer

VPC - Virtual Private Cloud: private network to deploy your resources (regional resource)
Subnets allow you to partition your network inside your VPC (Availability Zone resource)
A public subnet is a subnet that is accessible from the internet
A private subnet is a subnet that is not accessible from the internet
To define access to the internet and between subnets, we use Route Tables.

Internet Gateway

Allows resources (e.g., EC2 instances) in a VPC connect to the Internet
It scales horizontally and is highly available and redundant
Must be created separately from a VPC
One VPC can only be attached to one IGW and vice versa

Tip

Internet Gateways on their own do not allow Internet access... -> Route tables must also be edited! IPv6 only (egress) : Internet Gateway IPv4 only (ingress) : NAT Gateway

Bastion Hosts

We can use a Bastion Host to SSH into our private EC2 instances
The bastion is in the public subnet which is then connected to all other private subnets
Bastion Host security group must allow inbound from the internet on port 22 from restricted CIDR
Security Group of the EC2 Instances must allow the Security Group of the Bastion Host, or the private IP of the Bastion host

NAT Instance

NAT = Network Address Translation

Allows EC2 instances in private subnets to connect to the Internet
Must be launched in a public subnet
Must disable EC2 setting: Source/destination Check
Must have Elastic IP attached to it
RouteTables must be configured to route traffic from private subnets to the NAT Instance

NAT Gateway

AWS-managed NAT, higher bandwidth, high availability, no administration

Pay per hour for usage and bandwidth
NATGW is created in a specific Availability Zone, uses an Elastic IP
Can’t be used by EC2 instance in the same subnet (only from other subnets)
Requires an IGW (Private Subnet => NATGW => IGW)
5 Gbps of bandwidth with automatic scaling up to 45 Gbps
No Security Groups to manage / required

Tip

Ensure that your connection is using a TCP, UDP, or ICMP protocol only.

Network Access Control List(NACL)

NACL are like a firewall which control traffic from and to subnets

One NACL per subnet, new subnets are assigned the Default NACL
You define NACL Rules:
- Rules have a number (1-32766), higher precedence with a lower number
- First rule match will drive the decision
- The last rule is an asterisk (*) and denies a request in case of no rule match
- AWS recommends adding rules by increment of 100
Newly created NACLs will deny everything
NACL are a great way of blocking a specific IP address at the subnet level

Default NACL

Accepts everything inbound/outbound with the subnets it’s associated with
Do NOT modify the Default NACL, instead create custom NACLs

Ephemeral Ports

For any two endpoints to establish a connection, they must use ports
Clients connect to a defined port, and expect a response on an ephemeral port
Different Operating Systems use different port ranges, examples:
- IANA & MS Windows10 -> 49152–65535
- Many Linux Kernel -> 32768–60999

Network ACL VS Security Groups

Security GroupsNACL (Network ACL)

A firewall that controls traffic to and from an ENI(Elastic Network Interface) / an EC2 Instance
Can have only ALLOW rules
Is stateful: Return traffic is automatically allowed, regradless of any rules
All rules evaluated before deciding whether to allow traffic
Applies to an instance only if someone specifies the security when launching the instance or associate the security group with the instance later on
Rules include IP addresses and other security groups
Security Groups can be applied to instances in any subnet

A firewall which controls traffic from and to subnet
Can have ALLOW and DENY rules
Is stateless: Return traffic must be explicitly allowed by rules
Rules processed in number order when deciding whether to allow traffic
Are attached at the Subnet level
Rules only include IP addresses

Network ACL VS Security Groups

Tip

A stateful firewall allows the return traffic automatically
A stateless firewall checks for an allow rule for both connections

VPC Flow Logs

Capture information about the IP traffic flowing in or out of network interfaces in a VPC:

VPC Flow Logs

Subnet Flow Logs

Elastic Network Interface Flow Logs

Helps to monitor & troubleshoot connectivity issues
Example:
- Subnets to internet
- Subnets to subnets
- Internet to subnets
Captures network information from AWS managed interfaces too: Elastic Load Balancers, ElastiCache, RDS, Aurora, etc...
VPC Flow logs data can go to S3 bucket / CloudWatch Logs
Flow logs can be created at the following levels:
- VPC
- Subnet
- Network interface

Tip

Flow logs do not provide the ability to view a real-time stream of traffic. Logs are published every 10 minutes by default but can be configured for faster delivery.

Warning

After you create a flow log, you CANNOT change its configuration or the flow log record format

VPC to VPC Connectivity Options

VPC Peering
AWS Transit Gateway
Software S2S VPN
Software VPN to AWS VPN
AWS Managed VPN
AWS PrivateLink

VPC Peering

Connect two VPC, privately using AWS’s network

Make them behave as if they were in the same network
Must not have overlapping CIDR (IP address range)
VPC Peering connection is not transitive (must be established for each VPC that need to communicate with one another)
You must update route tables in each VPC’s subnets to ensure EC2 instances can communicate with each other
VPC Peering enables routing using private IPv4 or IPv6 addresses

Tip

You can create VPC Peering connection between VPCs in different AWS accounts/regions
You can reference a security group in a peeredVPC (works cross accounts – same region)
You cannot reference the security group of a peer VPC that's in a different Region. Instead, use the CIDR block of the peer VPC
BGP protocol uses TCP port 179 for establishing a peering connection. While establishing AWS VPN connectivity using BGP protocol, it needs to be checked that TCP port 179 is not blocked in the network

VPC Endpoints

VPC Endpoints (powered by AWS PrivateLink) allow resources inside a VPC to connect to other AWS Services outside the VPC using a private network instead of the public www network

Gives you enhanced security and lower latency to access AWS services
Redundant and scale horizontally
Remove the need of IGW, NATGW, ... to access AWS Services
VPC Endpoint Gateway: S3 & DynamoDB
VPC Endpoint Interface: the rest
Only used within your VPC

Types of Endpoints

Interface Endpoints (powered by PrivateLink)Gateway Endpoints

An ENI within an AWS VPC that has a private IP address within the VPC subnet of the resources that are consuming the service
Provisions an ENI (private IP address) as an entry point (must attach a Security Group)
Uses DNS entries to redirect traffic
Supports most AWS services(API Gateway, CloudFormation, CloudWatch etc.)
Security: Security Groups
$ per hour + $ per GB of data processed

Provisions a gateway and must be used as a target in a route table (does not use security groups)
Uses prefix lists in the route table to redirect traffic
Supports both S3 and DynamoDB
Security: VPC Endpoint Policies
Free

Site to Site VPN & Direct Connect

Site to Site VPNDirect Connect (DX)

AWS VPN is a managed IPSec VPN
Connect an on-premises VPN to AWS
The connection is automatically encrypted
Supports static routes or BGP peering/routing
Goes over the public internet
A VGW(Virtual Private Gateway) is deployed on the AWS site
A CGW(Customer Gateway) is deployed on the customer side

DX is a physical fibre connection between on-premises and AWS running at 1Gbps or 10Gbps or 100Gbps
A cross-connect between the AWS DX router and the customer/partner DX router
A DX port (1000-Base-LX or 10GBASE-LR) must be allocated in a DX location
The customer router is connected to the DX router in the DX location
The connection is private, secure and fast
Goes over a private network
Takes at least a month to establish
Speeds from 50Mbps to 500Mbps can also be accessed via an APN partner
DX Connections are NOT encrypted!
Use an IPSec S2S VPN connection over a VIF to add encryption in transit

Note

Site-to-site VPN and Direct Connect cannot access VPC endpoints

Note

A VIF is a virtual interface (802.1Q VLAN) and a BGPsession
A Private VIF connects to a single VPC in the same AWS Region using a VGW
A Public VIF can be used to connect to AWS Public services in any Region (but not the Internet)
Multiple Private VIFs can be used to connect to multiple VPCs in the Region
VIFs can also be shared with other AWS accounts – known as hosted VIFs

AWS VPN CloudHub

A VGW is deployed on the AWS site
Network traffic may go between a VPC and a remote office
Network traffic between offices can also be routed over the IPSec VPN
Remote offices connect to the VGW in a hub-and-spoke mode
Each office must use a unique BGP ASN

Transit Gateway

Transit Gateway is a network transit hub that interconnects VPCs and on-premises networks

For having transitive peering between thousands of VPC and on-premises, hub-and-spoke (star) connection
One single Gateway to provide this functionality
TGWs can be attached to VPNs, Direct Connect Gateways, 3rd party appliances and TGWs in other Regions/accounts

Note

Connections supported by a transit gateway:

VPN to a physical datacenter
Direct Connect Gateway
Transitive connections between multiple VPCs

Tip

VPC Peering does not have an aggregate bandwidth limitation.
Transit gateway connections to a VPC provide up to 50 Gbps of bandwidth.
A VPN connection provides a maximum throughput of 1.25 Gbps.

IPv6 in VPC

IPv4 cannot be disabled for your VPC and subnets
All IPv6 addresses are publicly routable (no NAT)
You can enable IPv6 (they’re public IP addresses) to operate in dual-stack mode
Your EC2 instances will get at least a private internal IPv4 and a public IPv6
They can communicate using either IPv4 or IPv6 to the internet through an Internet Gateway

Egress-only Internet Gateway

Used for IPv6 only(similar to a NAT Gateway but for IPv6)
Allows instances in your VPC outbound connections over IPv6 while preventing the internet to initiate an IPv6 connection to your instances
You must update the Route Tables

AWS Network Firewall

The AWS Network Firewall supports outbound traffic control using HTTPS (SNI)/HTTP protocol URL filtering, access control lists (ACLs), DNS queries, and protocol detection.
AWS Network Firewall rules can be based on domain, port, protocol, IP addresses, and pattern matching.

Traffic Mirroring

VPC Traffic Mirroring duplicates inbound and outbound traffic for Amazon EC2 instances within an Amazon VPC without the need to install anything on the instances themselves.
You are able to send the duplicated traffic to the destination of your choice for analysis.
VPC Traffic Mirroring collects the full packet, allowing payload analysis, while other tools collect information of protocol, source, and destination. This lets you analyze for both active and passive attacks.

Tip

The following traffic types cannot be mirrored:

ARP
DHCP
Instance metadata service
NTP
Windows activation

Gateway

AWS Storage Gateway

AWS Storage Gateway is a service that connects an on-premises data center to the cloud. It provides block-based storage that is compatible with the Portable Operating System Interface (POSIX) and can be used as a target for data backups

Tip

To use the HA feature of Storage Gateway, the VMware environment must provide the following:

A cluster with vSphere HA enabled
A shared datastore

File Gateway

File gateway provides a virtual on-premises file server
Store and retrieve files as objects in Amazon S3
Use with on-premises applications, and EC2-based applications that need file storage in S3 for object-based workloads
File gateway offers SMB or NFS-based access to data in Amazon S3 with local caching
A local cache provides low latency access to recently used data
A virtual gateway appliance runs on Hyper-V, VMware, or EC2

Tip

CachePercentDirty is an Amazon CloudWatch metric for Amazon S3 File Gateway which provides a percentage of the data not uploaded to AWS from local Cache. This metric value should be near to zero to ensure all cache data is properly uploaded to AWS.

Volume Gateway

The volume gateway supports block-based volumes
Block storage – iSCSI protocol
Cached Volume mode – the entire dataset is stored on S3 and a cache of the most frequently accessed data is cached on-site
Stored Volume mode – the entire dataset is stored on-site and is asynchronously backed up to S3 (EBS point-in-time snapshots). Snapshots are incremental and compressed

Tape Gateway

Used for backup with popular backup software
Each gateway is preconfigured with a media changer and tape drives. Supported by NetBackup, Backup Exec, Veeam etc.
When creating virtual tapes, you select one of the following sizes: 100 GB, 200 GB, 400 GB, 800 GB, 1.5 TB, and 2.5 TB
A tape gateway can have up to 1,500 virtual tapes with a maximum aggregate capacity of 1 PB
All data transferred between the gateway and AWS storage is encrypted using SSL
All data stored by tape gateway in S3 is encrypted server-side with Amazon S3-Managed Encryption Keys (SSE-S3)

S3

Use Cases

Backup and storage
Disaster Recovery
Archive
Hybrid Cloud storage
Application hosting
Media hosting
Data lakes & big data analytics
Software delivery
Static website

Buckets

Amazon S3 allows people to store objects (files) in "buckets" (directories)
Buckets must have a globally unique name (across all regions all accounts)
Buckets are defined at the region level
S3 looks like a global service but buckets are created in a region
Naming convention
- Can consist only of lowercase letters, numbers, dots (.), and hyphens (-)
- 3-63 characters long
- Not an IP
- Must start with lowercase letter or number
- Must NOT start with the prefix xn--
- Must NOT end with the suffix -s3alias(This suffix is reserved for access point alias names)
- Must not end with the suffix --ol-s3(This suffix is reserved for Object Lambda Access Point alias names)

Objects

Objects (files) have a Key(The object name is the Key, the data is the Value)
- The key is the FULL path
- A Folder is a shared prefix for grouping objects
- The key is composed of prefix + object name
- Just keys with very long names that contain slashes ("/")
Object Values are the content of the body
Tags (Unicode key / value pair – up to 10) – useful for security / lifecycle

Note

An objects consists of:

Key (the name of the object)
Version ID (if versioning is enabled)
Value (actual data)
Metadata (list of text key / value pairs – system or user metadata)
Subresources
Access control information

Tip

Folders can be created within folders
Buckets cannot be created within buckets

Multipart Upload

Multipart upload uploads objects in parts independently, in parallel and in any order
Performed using the S3 Multipart upload API
It is recommended for objects of 100 MB or larger
Can be used for objects from 5 MB up to 5 TB
Must be used for objects larger than 5 GB

Tip

It's a best practice to use aws s3 commands (such as aws s3 cp) for multipart uploads and downloads, because these aws s3 commands automatically perform multipart uploading and downloading based on the file size.

By comparison, aws s3api commands, such as aws s3api create-multipart-upload, should be used only when aws s3 commands don't support a specific upload need. Such as when the multipart upload involves multiple servers, a multipart upload is manually stopped and resumed later, or when the aws s3 command doesn't support a required request parameter.

Note

Content-MD5 header is an optional header in the Amazon S3 REST requests, which can be used to check the integrity of the data and ensure data is not corrupt during transit. When data is uploaded using PutObject having Content-MD5 as a request header, Amazon S3 checks the data against the Content-MD5 value. If the value is not matched, an error is generated.

Versioning

Versioning is a means of keeping multiple variants of an object in the same bucket

Use versioning to preserve, retrieve, and restore every version of every object stored in your Amazon S3 bucket
Versioning-enabled buckets enable you to recover objects from accidental deletion or overwrite
It is enabled at the bucket level
Same key overwrite will increment the "version": 1, 2, 3....
It is best practice to version your buckets
- Protect against unintended deletes (ability to restore a version)
- Easy roll back to previous version

Notes

Any file that is not versioned prior to enabling versioning will have version "null"
Suspending versioning does not delete the previous versions

Security

User based
- IAM policies - which API calls should be allowed for a specific user from IAM console
Resource Based
- Bucket Policies - bucket wide rules from the S3 console - allows cross account
- Object Access Control List (ACL) – finer grain(can be disabled)
- Bucket Access Control List (ACL) – less common(can be disabled)

Note

An IAM principal can access an S3 object if

the user IAM permissions allow it OR the resource policy ALLOWS it
AND there’s no explicit DENY

Encryption: encrypt objects in Amazon S3 using encryption keys
Networking:
- Supports VPC Endpoints (for instances inVPC without www internet)
Logging and Audit:
- S3 Access Logs can be stored in other S3 bucket
- API calls can be logged in AWS CloudTrail
User Security:
- MFA Delete: MFA (multi factor authentication) can be required in versioned buckets to delete objects
- Pre-Signed URLs: URLs that are valid only for a limited time (ex: premium video service for logged in users)

Tip

AWS generally recommends using S3 bucket policies or IAM policies rather than ACLs

S3 Bucket Policies

JSON based policies
- Resources: buckets and objects
- Effect: Allow / Deny
- Actions: Set of API to Allow or Deny
- Principal:The account or user to apply the policy to
Use S3 bucket for policy to:
- Grant public access to the bucket
- Force objects to be encrypted at upload
- Grant access to another account (Cross Account)

IAM Policy VS S3 Bucket Policy

Use IAM policies ifUse S3 bucket policies if

You need to control access to AWS services other than S3
You have numerous S3 buckets each with different permissions requirements (IAM policies will be easier to manage)
You prefer to keep access control policies in the IAM environment

You want a simple way to grant cross-account access to your S3 environment, without using IAM roles
Your IAM policies are reaching the size limits
You prefer to keep access control policies in the S3 environment

Objects Encryption

There are 4 methods of encrypting objects in S3

SSE-S3SSE-KMSSSE-CClient Side Encryption

Server-Side Encryption with Amazon S3-Managed Keys
Encrypts S3 objects using keys handled & managed by AWS
Object is encrypted server side
AES-256 encryption type
Must set header: "x-amz-server-side-encryption": "AES256"

Server-Side Encryption with KMS Keys stored in AWS KMS
Leverage AWS Key Management Service to manage encryption keys
KMS Advantages: user control + audit key usage using CloudTrail
Object is encrypted server side
Must set header: "x-amz-server-side-encryption": "aws:kms"

Server-Side Encryption with Customer-Provided Keys
Server-side encryption using data keys fully managed by the customer outside of AWS
Amazon S3 does not store the encryption key you provide
HTTPS must be used
Encryption key must provided in HTTP headers, for every HTTP request made

Use Client library such as the Amazon S3 Client-Side Encryption Library
Clients must encrypt data themselves before sending to S3
Clients must decrypt data themselves when retrieving from S3
Customer fully manages the keys and encryption cycle

Default Encryption

Amazon S3 default encryption provides a way to set the default encryption behavior for an S3 bucket
You can set default encryption on a bucket so that all new objects are encrypted when they are stored in the bucket
The objects are encrypted using server-side encryption
Amazon S3 encrypts objects before saving them to disk and decrypts them when the objects are downloaded
There is no change to the encryption of objects that existed in the bucket before default encryption was enabled

Encryption in transit (SSL/TLS)

Encryption in flight is also called SSL/TLS

Amazon S3 exposes:
- HTTP endpoint: non encrypted
- HTTPS endpoint: encryption in flight
You’re free to use the endpoint you want, but HTTPS is recommended
Most clients would use the HTTPS endpoint by default
HTTPS is mandatory for SSE-C
Encryption in flight is also called SSL /TLS

MFA Delete(Multi-Factor Authentication Delete)

Adds MFA requirement for bucket owners to the following operations:
- Changing the versioning state of a bucket
- Permanently deleting an object version
The x-amz-mfa request header must be included in the above requests
The second factor is a token generated by a hardware device or software program
Requires versioning to be enabled on the bucket

Tip

Versioning can be enabled by:
- Bucket owners (root account)
- AWS account that created the bucket
- Authorized IAM users
MFA delete can be enabled by:
- Bucket owner (root account)

S3 Event Notifications

Sends notifications when events happen in buckets
Destinations include:
- Amazon Simple Notification Service (SNS) topics
- Amazon Simple Queue Service (SQS) queues
- AWS Lambda

Server Access Logging

Provides detailed records for the requests that are made to a bucket
Details includes:
- requester
- bucket name
- request time
- request action
- response status
- error code (if applicable)
Disabled by default
Only pay for the storage space used
Must configure a separate bucket as the destination (can specify a prefix)
Must grant write permissions to the Amazon S3 Log Delivery group on destination bucket

Static Websites Hosting

S3 can host static websites and have them accessible on the www
The website URL will be:
- http://<bucket-name>.s3-website-<AWS-region>.amazonaws.com
- http://<bucket-name>.s3-website.<AWS-region>.amazonaws.com
If you get a 403 (Forbidden) error, make sure the bucket policy allows public reads!
If a client does a cross-origin request on our S3 bucket, we need to enable the correct CORS headers

Replication(CRR & SRR)

Must enable versioning in source and destination
Must give proper IAM permissions to S3
Buckets can be in different accounts
Copying is asynchronous
Use Cases:
- Cross Region Replication (CRR): compliance, lower latency access, replication across accounts
- Same Region Replication (SRR): log aggregation, live replication between production and test accounts

Note

After you enable Replication, only new objects are replicated
Optionally, you can replicate existing objects using S3 Batch Replication
- Replicates existing objects and objects that failed replication
For DELETE operations
- Can replicate delete markers from source to target (optional setting)
- Deletions with a version ID are not replicated (to avoid malicious deletes)
There is no "chaining" of replication
- If bucket 1 has replication into bucket 2, which has replication into bucket 3
- Then objects created in bucket 1 are not replicated to bucket 3

S3 Pre-Signed URLs

Can generate pre-signed URLs using SDK or CLI
- For downloads (easy, can use the CLI)
- For uploads (harder, must use the SDK)
Valid for a default of 3600 seconds, can change timeout with --expires-in <TIME_BY_SECONDS> argument
Users given a pre-signed URL inherit the permissions of the person who generated the URL for GET / PUT

CORS with Amazon S3

Allows requests from an origin to another origin
Origin is defined by DNS name, protocol, and port

Enabled through setting:
- Access-Control-Allow-Origin
- Access-Control-Allow-Methods
- Access-Control-Allow-Headers
These settings are defined using rules
Rules are added using JSON files in S3

Cross Account Access

Methods:

Resource-based policies and IAM policies for programmatic-only access to S3 bucket objects
Resource-based ACL and IAM policies for programmatic-only access to S3 bucket objects
Cross-account IAM roles for programmatic and console access to S3 bucket objects

Storage Classes

General Purpose
Infrequent Access
- Standard-Infrequent Access (IA)
  - 99.9% Availability
  - Use cases: Disaster Recovery, backups
- One Zone-Infrequent Access
  - High durability (99.999999999%) in a single AZ; data lost when AZ is destroyed
  - 99.5% Availability
  - Use Cases: Storing secondary backup copies of on-premises data, or data you can recreate
Glacier Storage Classes
- Glacier Instant Retrieval
  - Minimum storage duration of 90 days
  - Millisecond retrieval, great for data accessed once a quarter
- Glacier Flexible Retrieval
  - Minimum storage duration of 90 days
  - Expedited (1 to 5 minutes)
  - Standard (3 to 5 hours)
  - Bulk (5 to 12 hours) – free
- Glacier Deep Archive
  - Minimum storage duration of 180 days
  - Standard (12 hours)
  - Bulk (48 hours)
Intelligent Tiering
- Small monthly monitoring and auto-tiering fee
- Moves objects automatically between Access Tiers based on usage
- There are no retrieval charges in S3 Intelligent-Tiering
- Frequent Access tier (automatic): default tier
- Infrequent Access tier (automatic): objects not accessed for 30 days
- Archive Instant Access tier (automatic): objects not accessed for 90 days
- Archive Access tier (optional): configurable from 90 days to 700+ days
- Deep Archive Access tier (optional): config. from 180 days to 700+ days

Note

Can move between classes manually or using S3 Lifecycle configurations

Lifecycle Rules

Transition Actions – configure objects to transition to another storage class
- Move objects to Standard IA class 60 days after creation
- Move to Glacier for archiving after 6 months
Expiration actions – configure objects to expire (delete) after some time
- Access log files can be set to delete after a 365 days
- Can be used to delete old versions of files (if versioning is enabled)
- Can be used to delete incomplete Multi-Part uploads
Rules can be created for a certain prefix (example: s3://mybucket/mp3/*)
Rules can be created for certain objectsTags (example:Department:Finance)

Lifecycle Transitions

Select & Glacier Select

Retrieve less data using SQL by performing server-side filtering
Can filter by rows & columns (simple SQL statements)
Less network transfer, less CPU cost client-side

S3 Batch Operations

Perform bulk operations on existing S3 objects with a single request, example:
- Modify object metadata & properties
- Copy objects between S3 buckets
- Encrypt un-encrypted objects
- Modify ACLs,tags
- Restore objects from S3 Glacier
- Invoke Lambda function to perform custom action on each object
A job consists of a list of objects, the action to perform, and optional parameters
S3 Batch Operations manages retries, tracks progress, sends completion notifications, generate reports ...
You can use S3 Inventory to get object list and use S3 Select to filter your objects

Pre-Signed URLs

Generate pre-signed URLs using the S3 Console, AWS CLI or SDK
URL Expiration
- S3 Console – 1 min up to 720 mins (12 hours)
- AWS CLI – configure expiration with --expires-in parameter in seconds (default 3600 secs, max. 604800 secs ~ 168 hours)
Users given a pre-signed URL inherit the permissions of the user that generated the URL for GET / PUT

Example

Allow only logged-in users to download a premium video from your S3 bucket
Allow an ever-changing list of users to download files by generating URLs dynamically
Allow temporarily a user to upload a file to a precise location in your S3 bucket

S3 Object Lock & Glacier Vault Lock

S3 Object LockGlacier Vault Lock

Adopt a WORM (Write Once Read Many) model
Block an object version deletion for a specified amount of time
Retention mode - Compliance:
- Object versions can't be overwritten or deleted by any user, including the root user
- Objects retention modes can't be changed, and retention periods can't be shortened
Retention mode - Governance:
- Most users can't overwrite or delete an object version or alter its lock settings
- Some users have special permissions to change the retention or delete the object
Retention Period: protect the object for a fixed period, it can be extended
Legal Hold:
- Protect the object indefinitely, independent from retention period
- Can be freely placed and removed using the s3:PutObjectLegalHold IAM permission

Adopt a WORM (Write Once Read Many) model
Create a Vault Lock Policy
Lock the policy for future edits (can no longer be changed)
Helpful for compliance and data retention
Locking a vault takes two steps:
- Initiate the lock by attaching a Vault Lock policy to your vault, which sets the lock to an in-progress state and returns a lock ID.
  - While the policy is in the in-progress state, you have 24 hours to validate your Vault Lock policy before the lock ID expires.
  - To prevent your vault from exiting the in-progress state, you must complete the Vault Lock process within these 24 hours. Otherwise, your Vault Lock policy will be deleted.
- Use the lock ID to complete the lock process. If the Vault Lock policy doesn't work as expected, you can stop the Vault Lock process and restart from the beginning.

Warning

After you enable Object Lock on a bucket, you can't disable Object Lock or suspend versioning for that bucket.
S3 buckets with Object Lock can't be used as destination buckets for server access logs.

Access Points & Object Lambda

Access PointsObject Lambda

Each Access Point gets its own DNS and policy to limit who can access it
A specific IAM user / group
One policy per Access Point => Easier to manage than complex bucket policies

Use AWS Lambda Functions to change the object before it is retrieved by the caller application
You can use your own functions or use the AWS pre-built functions
Only one S3 bucket is needed, on top of which we create S3 Access Point and S3 Object Lambda Access Points.
Use Cases:
- Redacting personally identifiable information for analytics or non- production environments.
- Converting across data formats,such as converting XML to JSON.
- Resizing and watermarking images on the fly using caller-specific details, such as the user who requested the object.

AWS Storage Gateway

Bridge between on-premise data and cloud data in S3
Hybrid storage service to allow on-premises to seamlessly use the AWS Cloud
Use cases:
- disaster recovery
- backup & restore
- tiered storage
- on-premises cache & low-latency files access
Types of Storage Gateway:
- S3 File Gateway
- FSx File Gateway
- Volume Gateway
- Tape Gateway

Container

Docker Containers Management on AWS

Amazon Elastic Container Service (Amazon ECS)
- Amazon’s own container platform
Amazon Elastic Kubernetes Service (Amazon EKS)
- Amazon’s managed Kubernetes (open source)
AWS Fargate
- Amazon’s own Serverless container platform
- Works with ECS and with EKS
Amazon ECR:
- Store container images

ECS

ECS: Elastic Container Service

Note

ECS Key Concepts:

Task Definition: Templates for your tasks. This is where you sepcify your Docker image, Memory, CPU, Ports, Volumes, Environment Variables, etc.
Task: A running instance of a Task Definition
Container(EC2 only): Virtualized Instance that run your Tasks
Cluster
- EC2 Launch Type: A cluster of EC2 instances that run your Tasks
- Fargate Launch Type: A cluster of Fargate Tasks
Service: A Task management system that ensures X amount of tasks are up and running at all times

EC2 Launch Type

Launch Docker containers on AWS = Launch ECS Tasks on ECS Clusters
You must provision & maintain the infrastructure (the EC2 instances)
Each EC2 Instance must run the ECS Agent to register in the ECS Cluster
AWS takes care of starting / stopping containers
Has integrations with the Application Load Balancer
Charged per running EC2 instance
Docker volumes, EFS, and FSx for Windows File Server
You handle cluster optimization
More granular control over infrastructure

Fargate LaunchType

You do not provision the infrastructure (no EC2 instances to manage)
It’s all Serverless! You just create task definitions
AWS just runs ECS Tasks for you based on the CPU / RAM you need
To scale, just increase the number of tasks
Charged for running tasks
EFS integration
Fargate handles cluster optimization
Limited control, infrastructure is automated

IAM Roles for ECS

EC2 Instance Profile (EC2 Launch Type only)ECS Task Role

Used by the ECS agent
Makes API calls to ECS service
Send container logs to CloudWatch Logs
Pull Docker image from ECR
Reference sensitive data in Secrets Manager or SSM Parameter Store

Allows each task to have a specific role -> provide permissions to the container
Use different roles for the different ECS Services you run
Task Role is defined in the task definition

Note

Container instances have access to all of the permissions that are supplied to the container instance role through instance metadata

Load Balancer Integrations

Application Load Balancer supported and works for most use cases
Network Load Balancer recommended only for high throughput / high performance use cases, or to pair it with AWS Private Link
Elastic Load Balancer supported but not recommended (no advanced features – no Fargate)

Data Volumes (EFS)

Mount EFS file systems onto ECS tasks
Works for both EC2 and Fargate launch types
Tasks running in any AZ will share the same data in the EFS file system
Fargate + EFS = Serverless
Use cases: persistent multi-AZ shared storage for your containers
Note: Amazon S3 cannot be mounted as a file system

Data Volumes (Bind Mounts)

Share data between multiple containers in the same Task Definition
Works for both EC2 and Fargate tasks
EC2 Tasks – using EC2 instance storage
- Data are tied to the lifecycle of the EC2 instance
Fargate Tasks – using ephemeral storage
- Data are tied to the container(s) using them
- 20 GiB – 200 GiB (default 20 GiB)
Use cases:
- Share ephemeral data between multiple containers
- "Sidecar" container pattern, where the "sidecar" container used to send metrics/logs to other destinations (separation of concerns)

ECS Service Auto Scaling

Automatically increase/decrease the desired number of ECS tasks
Amazon ECS Auto Scaling uses AWS Application Auto Scaling
- ECS Service Average CPU Utilization
- ECS Service Average Memory Utilization - Scale on RAM
- ALB Request Count Per Target – metric coming from the ALB
Scaling Policies:
- Target Tracking – scale based on target value for a specific CloudWatch metric
- Step Scaling – scale based on a specified CloudWatch Alarm
- Scheduled Scaling – scale based on a specified date/time (predictable changes)
ECS Service Auto Scaling (task level) ≠ EC2 Auto Scaling (EC2 instance level)
Fargate Auto Scaling is much easier to setup (because Serverless)

Auto Scaling EC2 Instances

Accommodate ECS Service Scaling by adding underlying EC2 Instances
Auto Scaling Group Scaling
- Scale your ASG based on CPU Utilization
- Add EC2 instances over time
ECS Cluster Capacity Provider
- Used to automatically provision and scale the infrastructure for your ECSTasks
- Capacity Provider paired with an Auto Scaling Group
- Add EC2 Instances when you’re missing capacity (CPU, RAM...)

Task Definitions

Task definitions are metadata in JSON form to tell ECS how to run a Docker container
It contains crucial information, such as:
- Image Name
- Port Binding for Container and Host
- Memory and CPU required
- Environment variables
- Networking information
- IAM Role
- Logging configuration (ex CloudWatch)
Can define up to 10 containers in a Task Definition

Task Placement Strategies

Binpack
- Tasks are placed on the least available amount of CPU and Memory
- Minimizes the number of EC2 instances in use (cost savings)
Random
- Tasks are placed randomly
Spread
- Tasks are placed evenly based on the specified value
- Example: instanceId, attribute:ecs.availability-zone, ...

Load Balancing (EC2 Launch Type)

We get a Dynamic Host Port Mapping if you define only the container port in the task definition
The ALB finds the right port on your EC2 Instances
You must allow on the EC2 instance’s Security Group any port from the ALB’s Security Group

Load Balancing (Fargate)

Each task has a unique private IP
Only define the container port (host port is not applicable)

Example

ECS ENI Security Group -> Allow port 80 from the ALB
ALB Security Group -> Allow port 80/443 from web

Environment Variables

Environment Variable
- Hardcoded – e.g., URLs
- SSM Parameter Store – sensitive variables (e.g., API keys, shared configs)
- Secrets Manager – sensitive variables (e.g., DB passwords)
Environment Files (bulk) – Amazon S3

ECS Anywhere

Run ECS or EKS on customer-managed infrastructure, supported by AWS
Customers can run Amazon ECS/EKS Anywhere on their own on-premises infrastructure on bare metal servers
You can also deploy ECS/EKS Anywhere using VMware vSphere

ECR

ECR: Elastic Container Registry

Private and Public repository (Amazon ECR Public Gallery)
This is where you store your Docker images so they can be run by ECS or Fargate
Fully integrated with ECS and EKS
Access is controlled through IAM (permission errors => policy)
Supports image vulnerability scanning, versioning, image tags, image lifecycle, ...
Supports Open Container Initiative (OCI) and Docker Registry HTTP API V2 standards
You can use Docker tools and Docker CLI commands
Can be accessed from any Docker environment – in the cloud, on-premises, or on you machine
Container images and artifacts are stored in S3
You can use namespaces to organize repositories
Public repositories allow everyone to access container images
Access control applies to private repositories:
- IAM access control - Set policies to define access to container images in private repositories
- Resource-based policies - Access control down to the individual API action

Note

Lifecycle policies - manage the lifecycle of the images in your repositories
Image scanning - identify software vulnerabilities in your container images
Cross-Region and cross-account replication – replicate images across accounts/Region
Pull through cache rules - cache repositories in remote public registries in your private Amazon ECR registry

EKS

EKS: Elastic Kubernetes Service

Kubernetes is an open-source system for automatic deployment, scaling and management of containerized (usually Docker) application
Kubernetes is cloud-agnostic (can be used in any cloud – Azure, GCP...)
It is a way to launch managed Kubernetes clusters on AWS
It’s an alternative to ECS, similar goal but different API
EKS supports EC2 if you want to deploy worker nodes or Fargate to deploy serverless containers
Use case: if your company is already using Kubernetes on-premises or in another cloud, and wants to migrate to AWS using Kubernetes
For multiple regions, deploy one EKS cluster per region
Collect logs and metrics using CloudWatch Container Insights

Tip

It is recommended to run Kubernetes Workloads on Amazon EC2 Spot Instances with Amazon EKS as a cost optimization practice.

Node Types

Managed Node Groups
- Creates and manages Nodes (EC2 instances) for you
- Nodes are part of an ASG managed by EKS
- Supports On-Demand or Spot Instances
Self-Managed Nodes
- Nodes created by you and registered to the EKS cluster and managed by an ASG
- You can use prebuilt AMI - Amazon EKS Optimized AMI
- Supports On-Demand or Spot Instances
AWS Fargate
- No maintenance required
- no nodes managed

Data Volumes

Need to specify StorageClass manifest on your EKS cluster
Leverages a Container Storage Interface (CSI) compliant driver
Support for...
- Amazon EBS
- Amazon EFS (works with Fargate)
- Amazon FSx for Lustre
- Amazon FSx for NetApp ONTAP

Auto Scaling

Cluster Auto Scaling:
- Vertical Pod Autoscaler - automatically adjusts the CPU and memory reservations for your pods to help "right size" your applications
- Horizontal Pod Autoscaler - automatically scales the number of pods in a deployment, replication controller, or replica set based on that resource's CPU utilization
Workload Auto Scaling:
- Amazon EKS supports two autoscaling products:
  - Kubernetes Cluster Autoscaler
  - Karpenter open source autoscaling project
- The cluster autoscaler uses AWS scaling groups, while Karpenter works directly with the Amazon EC2 fleet

App Runner

Fully managed service that makes it easy to deploy containerized web applications and APIs at scale
No infrastructure experience required
Start with your source code or container image
Automatically builds and deploy the web app
Automatic scaling, highly available, load balancer, encryption
VPC access support
Connect to database, cache, and message queue services
Use cases: web apps, APIs, microservices, rapid production deployments

Databases

Choosing the Right Database

Questions to choose the right database based on your architecture:

Read-heavy, write-heavy, or balanced workload? Throughput needs? Will it change, does it need to scale or fluctuate during the day?
How much data to store and for how long? Will it grow? Average object size? How are they accessed?
Data durability? Source of truth for the data ?
Latency requirements? Concurrent users?
Data model? How will you query the data? Joins? Structured? Semi-Structured?
Strong schema? More flexibility? Reporting? Search? RDBMS / NoSQL?
License costs? Switch to Cloud Native DB such as Aurora?

Database Types

RDBMS (= SQL / OLTP): RDS, Aurora – great for joins
NoSQL database – no joins, no SQL
- DynamoDB (~JSON)
- ElastiCache (key / value pairs)
- Neptune (graphs)
- DocumentDB (for MongoDB)
- Keyspaces (for Apache Cassandra)
Object Store
- S3 (for big objects)
- Glacier (for backups / archives)
Data Warehouse (= SQL Analytics / BI)
- Redshift (OLAP)
- Athena
- EMR
Search: OpenSearch (JSON) – free text, unstructured searches
Graphs: Amazon Neptune – displays relationships between data
Ledger: Amazon Quantum Ledger Database
Time series: Amazon Timestream
Note: some databases are being discussed in the Data & Analytics section

RDS

RDS stands for Relational Database Service

It’s a managed DB service for DB use SQL as a query language.
It allows you to create databases in the cloud that are managed by AWS
- Postgres
- MySQL
- MariaDB
- Oracle
- Microsoft SQL Server
- Aurora (AWS Proprietary database)

Note

RDS runs on EC2 instances, so you must choose an instance type

Advantage over using RDS versus deploying DB on EC2

RDS is a managed service:
- Automated provisioning, OS patching
- Continuous backups and restore to specific timestamp (Point in Time Restore)!
- Monitoring dashboards
- Read replicas for improved read performance
- Multi AZ setup for DR (Disaster Recovery)
- Maintenance windows for upgrades
- Scaling capability (vertical and horizontal)
- Storage backed by EBS (gp2 or io1)
BUT you can’t SSH into your instances

RDS Backups

Backups are automatically enabled in RDS
Automated backups:
- Daily full backup of the database (during the maintenance window)
- Transaction logs are backed-up by RDS every 5 minutes
- Ability to restore to any point in time (from oldest backup to 5 minutes ago)
- 7 days retention (can be increased to 35 days)
Manual Backups (Snapshot):
- Manually triggered by the user
- Backs up the entire DB instance, not just individual databases
- For single-AZ DB instances there is a brief suspension of I/O
- For Multi-AZ SQL Server, I/O activity is briefly suspended on primary
- For Multi-AZ MariaDB, MySQL, Oracle and PostgreSQL the snapshot is taken from the standby
- Snapshots do not expire (no retention period)

Note

The DB instance must be in the available state for backups to occur
Restore can be to any point in time during the retention period
Restored DB instances are associated with the default DB parameter and option groups

Maintenance Windows

Operating system and DB patching can require taking the database offline
These tasks take place during a maintenance window
By default a weekly maintenance window is configured
You can choose your own maintenance window

Storage Auto Scaling

Helps you increase storage on your RDS DB instance dynamically
When RDS detects you are running out of free database storage, it scales automatically
Avoid manually scaling your database storage
You have to set Maximum Storage Threshold (maximum limit for DB storage)
Automatically modify storage if:
- Free storage is less than 10% of allocated storage
- Low-storage lasts at least 5 minutes
- 6 hours have passed since last modification
Useful for applications with unpredictable workloads
Supports all RDS database engines (MariaDB, MySQL, PostgreSQL, SQL Server, Oracle)

RDS Read Replicas for read scalability

Up to 5 Read Replicas
Within AZ, Cross AZ or Cross Region
Replication is ASYNC, so reads are eventually consistent
Replicas can be promoted to their own DB
Applications must update the connection string to leverage read replicas
Use Cases
- You have a production database that is taking on normal load
- You want to run a reporting application to run some analytics
- You create a Read Replica to run the new workload there
- The production application is unaffected
- Read replicas are used for SELECT (=read) only kind of statements (not INSERT, UPDATE, DELETE)
Network cost
- In AWS there’s a network cost when data goes from one AZ to another
- For RDS Read Replicas within the same region, you don’t pay that fee

Tip

Read replicas can be used where the reads are frequently distributed across the majority of the data in the database

RDS Multi AZ (Disaster Recovery)

SYNC replication
One DNS name – automatic app failover to standby
Increase availability
Failover in case of loss of AZ, loss of network, instance or storage failure
No manual intervention in apps
Not used for scaling
Multi-AZ replication is free
Note:The Read Replicas be setup as Multi AZ for Disaster Recovery (DR)

RDS Custom

Managed Oracle and Microsoft SQL Server Database with OS and database customization

RDS: Automates setup, operation, and scaling of database in AWS
Custom: access to the underlying database and OS so you can
- Configure settings
- Install patches
- Enable native features
- Access the underlying EC2 Instance using SSH or SSM Session Manager
De-activate Automation Mode to perform your customization, better to take a DB snapshot before
RDS vs. RDS Custom
- RDS: entire database and the OS to be managed by AWS
- RDS Custom: full admin access to the underlying OS and the database

RDS Security - Encryption

At rest encryption
- Possibility to encrypt the master & read replicas with AWS KMS - AES-256 encryption
- Encryption at rest can be enabled – includes DB storage, backups, read replicas and snapshots
- You can only enable encryption for an Amazon RDS DB instance when you create it, not after the DB instance is created
- DB instances that are encrypted can't be modified to disable encryption
- Uses AES 256 encryption and encryption is transparent with minimal performance impact
- If the master is not encrypted, the read replicas cannot be encrypted
- RDS for Oracle and SQL Server is also supported using Transparent Data Encryption (TDE) (may have performance impact)
In-flight encryption
- SSL certificates to encrypt data to RDS in flight
- Provide SSL options with trust certificate when connecting to database

RDS Encryption Operations

Encrypting RDS backups
- Snapshots of un-encrypted RDS databases are un-encrypted
- Snapshots of encrypted RDS databases are encrypted
- Can copy a snapshot into an encrypted one
To encrypt an un-encrypted RDS database:
- Create a snapshot of the un-encrypted database
- Copy the snapshot and enable encryption for the snapshot
- Restore the database from the encrypted snapshot
- Migrate applications to the new database, and delete the old database

RDS Security – Network & IAM

Network Security
- RDS databases are usually deployed within a private subnet, not in a public one
- RDS security works by leveraging security groups (the same concept as for EC2 instances) – it controls which IP / security group can communicate with RDS
Access Management
- IAM policies help control who can manage AWS RDS (through the RDS API)
- Traditional Username and Password can be used to login into the database
- IAM-based authentication can be used to login into RDS MySQL & PostgreSQL

RDS Security – Summary

Encryption at rest:
- Is done only when you first create the DB instance
- or: unencrypted DB => snapshot => copy snapshot as encrypted => create DB from snapshot
Your responsibility:
- Check the ports / IP / security group inbound rules in DB’s SG
- In-database user creation and permissions or manage through IAM
- Creating a database with or without public access
- Ensure parameter groups or DB is configured to only allow SSL connections
AWS responsibility:
- No SSH access
- No manual DB patching
- No manual OS patching
- No way to audit the underlying instance

RDS Proxy

Fully managed database proxy for RDS
Allows apps to pool and share DB connections established with the database
Improving database efficiency by reducing the stress on database resources (e.g., CPU, RAM) and minimize open connections (and timeouts)
Serverless, autoscaling, highly available (multi-AZ)
Reduced RDS & Aurora failover time by up 66%
Supports RDS (MySQL, PostgreSQL, MariaDB) and Aurora (MySQL, PostgreSQL)
Enforce IAM Authentication for DB, and securely store credentials in AWS Secrets Manager
RDS Proxy is never publicly accessible (must be accessed from VPC)

RDS Performance Insights

Amazon RDS Performance Insights is a database performance tuning and monitoring feature that helps you quickly assess the load on your database, and determine when and where to take action.
Performance Insights allows non-experts to detect performance problems with an easy-to-understand dashboard that visualizes database load.
It displays the database load in an interactive graph, allowing you to analyze and troubleshoot the database workload.
The load is categorized by SQL, waits, hosts, users, and other dimensions, providing detailed information about the sources of the load.

DocumentDB

MongoDB is used to store, query, and index JSON data(Similar "deployment concepts" as Aurora)
Fully Managed, highly available with replication across 3 AZ
Automatically scales to workloads with millions of requests per seconds
Storage scales automatically up to 64 TB without any impact to your application
Supports millions of requests per second with up to 15 low latency read replicas
Designed for 99.99% availability and replicates six copies of your data across three AZs
Can migrate from MongoDB using the AWS Database Migration Service (AWS DMS)

Amazon Neptune

Fully managed graph database

Highly available across 3 AZ, with up to 15 read replicas
Build and run applications working with highly connected datasets – optimized for these complex and hard queries
Can store up to billions of relations and query the graph with milliseconds latency
Highly available with replications across multiple AZs
Great for knowledge graphs (Wikipedia), fraud detection, recommendation engines, social networking
Fully managed graph database service
Build and run identity, knowledge, fraud graph, and other applications
Deploy high performance graph applications using popular open-source APIs including:
- Gremlin
- openCypher
- SPARQL
Offers greater than 99.99% availability
Storage is fault-tolerant and self-healing
DB volumes grow in increments of 10 GB up to a maximum of 64 TB
Create up to 15 database read replicas

Example

A popular graph dataset would be a social network

Users have friends
Posts have comments
Comments have likes from users
Users share and like posts...

Amazon ElastiCache

ElastiCache is to get managed Redis or Memcached
Caches are in-memory databases with high performance, low latency
Helps reduce load off databases for read intensive workloads
AWS takes care of OS maintenance / patching, optimizations, setup, configuration, monitoring, failure recovery and backups
Using ElastiCache involves heavy application code changes

Note

ElastiCache nodes run on Amazon EC2 instances, so you must choose an instance family/type

Redis vs Memcached

RedisMemcached

Multi AZ with Auto-Failover
Read Replicas to scale reads and have high availability
Data Durability using AOF persistence
Backup and restore features

Multi-node for partitioning of data (sharding)
No high availability (replication)
Non persistent
No backup and restore
Multi-threaded architecture

Cache Security

All caches in ElastiCache:
- Do not support IAM authentication
- IAM policies on ElastiCache are only used for AWS API-level security
Redis AUTH
- You can set a "password/token" when you create a Redis cluster
- This is an extra level of security for your cache (on top of security groups)
- Support SSL in flight encryption
Memcached
- Supports SASL-based authentication (advanced)

Lazy Loading / Cache-Aside / Lazy Population

Pros
- Only requested data is cached (the cache isn’t filled up with unused data)
- Node failures are not fatal (just increased latency to warm the cache)
Cons
- Cache miss penalty that results in 3 round trips, noticeable delay for that request
- Stale data: data can be updated in the database and outdated in the cache

Write Through – Add or Update cache when database is updated

Pros:
- Data in cache is never stale, reads are quick
- Write penalty vs Read penalty (each write requires 2 calls)
Cons:
- Missing Data until it is added / updated in the DB
- Mitigation is to implement Lazy Loading strategy as well
- Cache churn – a lot of the data will never be read

Use Case

Data that is relatively static and frequently accessed
Applications that are tolerant of stale data
Data is slow and expensive to get compared to cache retrieval
Require push-button scalability for memory, writes and reads
Often used for storing session state

Amazon Keyspaces(for Apache Cassandra)

Apache Cassandra is an open-source NoSQL distributed database

A managed Apache Cassandra-compatible database service
Serverless, Scalable, highly available, fully managed by AWS
Automatically scale tables up/down based on the application’s traffic
Tables are replicated 3 times across multiple AZ
Single-digit millisecond latency at any scale, 1000s of requests per second
Capacity: On-demand mode or provisioned mode with auto-scaling
Encryption, backup, Point-In-Time Recovery (PITR) up to 35 days
Use cases: store IoT devices info, time-series data, ...
Keyspaces enables you to use the Cassandra Query Language (CQL) API code
Keyspaces is serverless and fully managed
99.99% availability SLA within an AWS Region

Amazon Managed Blockchain

Blockchain makes it possible to build applications where multiple parties can execute transactions without the need for a trusted, central authority.
Amazon Managed Blockchain is a managed service to:
- Join public blockchain networks
- Or create your own scalable private network
Compatible with the frameworks Hyperledger Fabric & Ethereum

Amazon QLDB

QLDB stands for "Quantum Ledger Database"(A ledger is a book recording financial transactions)

Fully Managed, Serverless, Highavailable, Replication across 3AZ
Used to review history of all the changes made to your application data over time
Immutable system: no entry can be removed or modified, cryptographically verifiable
2-3x better performance than common ledger blockchain frameworks, manipulate data using SQL
Difference with Amazon Managed Blockchain: no decentralization component, in accordance with financial regulation rules
Amazon QLDB has a built-in immutable journal that stores an accurate and sequenced entry of every data change
The journal is append-only, meaning that data can only be added to a journal, and it cannot be overwritten or deleted
Amazon QLDB uses cryptography to create a concise summary of your change history
Generated using a cryptographic hash function (SHA-256)
Serverless and offers automatic scalability

Amazon Timestream

Fully managed, fast, scalable, serverless time series database
Automatically scales up/down to adjust capacity
Store and analyze trillions of events per day
1000s times faster & 1/10th the cost of relational databases
Scheduled queries, multi-measure records, SQL compatibility
Data storage tiering: recent data kept in memory and historical data kept in a cost-optimized storage
Built-in time series analytics functions (helps you identify patterns in your data in near real-time)
Encryption in transit and at rest
Use cases: IoT apps, operational applications, real-time analytics, ...

Serverless

With serverless there are no instances to manage
You don’t need to provision hardware
There is no management of operating systems or software
Capacity provisioning and patching is handled automatically
Provides automatic scaling and high availability
Can be very cheap!

Example

AWS Lambda
DynamoDB
AWS Cognito
AWS API Gateway
Amazon S3
AWS SNS & SQS
AWS Kinesis Data Firehose
Aurora Serverless
Step Functions
Fargate

AWS Lambda

Benefits

Easy Pricing:
- Pay per request and compute time
- Free tier of 1,000,000 AWS Lambda requests and 400,000 GBs of compute time
Integrated with the whole AWS suite of services
Event-Driven: functions get invoked by AWS when needed
Integrated with many programming languages
Easy monitoring through AWS CloudWatch
Easy to get more resources per functions (up to 10GB of RAM!)
Increasing RAM will also improve CPU and network!

Limits(per region)

Execution:
- Memory allocation: 128 MB – 10GB (1 MB increments)
- Maximum execution time: 900 seconds (15 minutes)
- Environment variables (4 KB)
- Disk capacity in the "function container" (in /tmp): 512 MB to 10GB
- Concurrency executions:
  - 3000 – US West (Oregon), US East (N. Virginia), Europe (Ireland)
  - 1000 – Asia Pacific (Tokyo), Europe (Frankfurt), US East (Ohio)
  - 500 – Other Regions
  - If the concurrency limit is exceeded throttling occurs with error "Rate exceeded" and a 429 "TooManyRequestsException"
Deployment:
- Lambda function deployment size (compressed.zip): 50 MB
- Size of uncompressed deployment (code + dependencies): 250 MB
- Can use the /tmp directory to load other files at startup
- Size of environment variables: 4 KB

Synchronous Invocations

Results is returned right away
Error handling must happen client side (retries, exponential backoff, etc…)
User Invoked:
- Elastic Load Balancing (Application Load Balancer)
- Amazon API Gateway
- Amazon CloudFront (Lambda@Edge)
- Amazon S3 Batch
Service Invoked:
- Amazon Cognito
- AWS Step Functions
Other Services:
- Amazon Lex
- Amazon Alexa
- Amazon Kinesis Data Firehose

Asynchronous Invocations

The events are placed in an Event Queue
Lambda attempts to retry on errors
- 3 tries total
- 1 minute wait after 1st, then 2 minutes wait
Make sure the processing is idempotent (in case of retries)
If the function is retried, you will see duplicate logs entries in CloudWatch Logs
Can define a DLQ (dead-letter queue) – SNS or SQS – for failed processing (need correct IAM permissions)
Asynchronous invocations allow you to speed up the processing if you don’t need to wait for the result (ex: you need 1000 files processed)

Lambda Integration with ALB

To expose a Lambda function as an HTTP(S) endpoint, You can use the Application Load Balancer (or an API Gateway)
The Lambda function must be registered in a target group
ALB can support multi-header values (ALB setting)
When you enable multi-value headers, HTTP headers and query string parameters that are sent with multiple values are shown as arrays within the AWS Lambda event and response objects.

Lambda in VPC

By default, your Lambda function is launched outside your own VPC (in an AWS-owned VPC)
Therefore it cannot access resources in your VPC (RDS, ElastiCache, internal ELB…)

You must define the VPC ID, the Subnets and the Security Groups
Lambda will create an ENI (Elastic Network Interface) in your subnets
AWSLambdaVPCAccessExecutionRole
A Lambda function in your VPC does not have internet access
Deploying a Lambda function in a public subnet does not give it internet access or a public IP
Deploying a Lambda function in a private subnet gives it internet access if you have a NAT Gateway / Instance
You can use VPC endpoints to privately access AWS services without a NAT

Lambda Function Configuration

RAM:
- From 128MB to 10GB in 1MB increments
- The more RAM you add, the more vCPU credits you get
- At 1,792 MB, a function has the equivalent of one full vCPU
- After 1,792 MB, you get more than one CPU, and need to use multi-threading in your code to benefit from it (up to 6 vCPU)
If your application is CPU-bound (computation heavy), increase RAM
Timeout: default 3 seconds, maximum is 900 seconds (15 minutes)

Amazon Aurora

Aurora is a proprietary technology(not open sourced) from AWS
PostgreSQL and MySQL are both supported as Aurora DB (that means your drivers will work as if Aurora was a Postgres or MySQL database)
Aurora is "AWS cloud optimized" and claims 5x performance improvement over MySQL on RDS, over 3x the performance of Postgres on RDS
Aurora storage automatically grows in increments of 10GB, up to 128 TB
Aurora can have 15 replicas while MySQL has 5, and the replication process is faster (sub 10 ms replica lag)
Failover in Aurora is instantaneous. It’s HA (High Availability) native.
Aurora costs more than RDS (20% more) – but is more efficient
Not in the free tier

Aurora High Availability and Read Scaling

6 copies of your data across 3 AZ:
- 4 copies out of 6 needed for writes
- 3 copies out of 6 need for reads
- Self healing with peer-to-peer replication
- Storage is striped across 100s of volumes
One Aurora Instance takes writes (master)
Automated failover for master in less than 30 seconds
Master + up to 15 Aurora Read Replicas serve reads
Support for Cross Region Replication

Features

Automatic failover
Backup and Recovery
Isolation and security
Industry compliance
Push-button scaling
Automated Patching with Zero Downtime
Advanced Monitoring
Routine Maintenance
Backtrack: restore data at any point of time without using backups

Aurora Fault Tolerance and Aurora Replicas

Fault tolerance across 3 AZs
Single logical volume
Aurora Replicas scale-out read requests
Up to 15 Aurora Replicas with sub-10ms replica lag
Aurora Replicas are independent endpoints
Can promote Aurora Replica to be a new primary or create new primary
Set priority (tiers) on Aurora Replicas to control order of promotion
Can use Auto Scaling to add replicas

Custom Endpoints

Define a subset of Aurora Instances as a Custom Endpoint
Example: Run analytical queries on specific replicas
The Reader Endpoint is generally not used after defining Custom Endpoints

Aurora Serverless

Automated database instantiation and auto- scaling based on actual usage
Good for infrequent, intermittent or unpredictable workloads
No capacity planning needed
Pay per second, can be more cost-effective

Global Aurora

Aurora Cross Region Read Replicas:
- Useful for disaster recovery
- Simple to put in place
Aurora Global Database (recommended):
- 1 Primary Region (read / write)
- Up to 5 secondary (read-only) regions, replication lag is less than 1 second
- Up to 16 Read Replicas per secondary region
- Helps for decreasing latency
- Promoting another region (for disaster recovery) has an RTO of < 1 minute
- Typical cross-region replication takes less than 1 second

Aurora Multi-Master

Multi-master clusters are best suited for segmented workloads, such as for multitenant applications.

All nodes allow reads/writes
Available for MySQL only
Up to four read/write nodes
Single Region only
Cannot have cross-Region replicas
Can work with active-active and active-passive workloads
Can restart read/write DB instance without impacting other instances

Aurora Serverless

Capacity seamlessly scales up and down
Each Aurora Capacity Unit (ACU) is 2 GB of memory plus CPU
Use Cases
- Infrequently used applications
- New applications
- Variable workloads
- Unpredictable workloads
- Development and test databases
- Multi-tenant applications

Aurora Machine Learning

Enables you to add ML-based predictions to your applications via SQL
Simple, optimized, and secure integration between Aurora and AWS ML services
Supported services
- Amazon SageMaker (use with any ML model)
- Amazon Comprehend (for sentiment analysis)
You don’t need to have ML experience
Use cases: fraud detection, ads targeting, sentiment analysis, product recommendations

Aurora Database Cloning

Create a new Aurora DB Cluster from an existing one
Faster than snapshot & restore
The new DB cluster uses the same cluster volume and data as the original but will change when data updates are made
Very fast & cost-effective
Useful to create a "staging" database from a "production" database without impacting the production database

RDS & Aurora Security

At-rest encryption:
- Database master & replicas encryption using AWS KMS – must be defined as launch time
- If the master is not encrypted, the read replicas cannot be encrypted
- To encrypt an un-encrypted database, go through a DB snapshot & restore as encrypted
In-flight encryption: TLS-ready by default,use the AWS TLS root certificates client-side
IAM Authentication: IAM roles to connect to your database (instead of username/pw)
Security Groups: Control Network access to your RDS / Aurora DB
No SSH available except on RDS Custom
Audit Logs can be enabled and sent to CloudWatch Logs for longer retention

Aurora Backups

Aurora backups are continuous and incremental so you can quickly restore to any point within the backup retention period
For Amazon Aurora DB clusters, the default backup retention period is one day regardless of how the DB cluster is created
You can't disable automated backups on Aurora. The backup retention period for Aurora is managed by the DB cluster
You can also use AWS Backup to manage backups of Amazon Aurora DB clusters
You can also take DB cluster snapshots which capture a snapshot of the entire DB cluster
Manual snapshots aren't subject to the backup retention period and do not expire
For very long-term backups, you can export snapshot data to Amazon S3
With Amazon Aurora, you can copy automated backups or manual DB cluster snapshots
You can copy a snapshot within the same AWS Region, you can copy a snapshot across AWS Regions, and you can copy shared snapshots
Amazon RDS deletes automated backups in several situations:
- At the end of their retention period
- When you disable automated backups for a DB cluster
- When you delete a DB cluster
If you want to keep an automated backup for a longer period, copy it to create a manual snapshot

DynamoDB

Fully Managed Highly available with replication across 3 AZ
NoSQL database(key/value) with transaction support
Scales to massive workloads, distributed "serverless" database
Millions of requests per seconds, trillions of row, 100s of TB of storage
Fast and consistent in performance
Single-digit millisecond latency – low latency retrieval
Integrated with IAM for security, authorization and administration
Enables event driven programming with DynamoDB Streams
Low cost and auto scaling capabilities
Standard & Infrequent Access (IA) Table Class

Basics

DynamoDB is made of Tables
Each table has a Primary Key (must be decided at creation time)
Each table can have an infinite number of items (= rows)
Each item has attributes (can be added over time – can be null)
Maximum size of an item is 400KB
Data types supported are:
- Scalar Types – String, Number, Binary, Boolean, Null
- Document Types – List, Map
- Set Types – String Set, Number Set, Binary Set

Primary Keys

Option 1: Partition Key (HASH)
- Partition key must be unique for each item
- Partition key must be "diverse" so that the data is distributed
Option 2: Partition Key + Sort Key (HASH + RANGE)
- The combination must be unique for each item
- Data is grouped by partition key

Read/Write Capacity Modes

Control how you manage your table’s capacity (read/write throughput)

Provisioned Mode (default)
- You specify the number of reads/writes per second
- You need to plan capacity beforehand
- Pay for provisioned Read Capacity Units (RCU) & Write Capacity Units (WCU)
- Possibility to add auto-scaling mode for RCU & WCU
On-Demand Mode
- Read/writes automatically scale up/down with your workloads
- No capacity planning needed
- Pay for what you use, more expensive ($$$)
- Great for unpredictable workloads, steep sudden spikes

Note

One Write Capacity Unit (WCU) represents one write per second for an item up to 1 KB in size
If the items are larger than 1 KB, more WCUs are consumed
One Read Capacity Unit (RCU) represents one Strongly Consistent Read per second, or two Eventually Consistent Reads per second, for an item up to 4 KB in size
If the items are larger than 4 KB, more RCUs are consumed

DynamoDB Time to Live (TTL)

TTL lets you define when items in a table expire so that they can be automatically deleted from the database
With TTL enabled on a table, you can set a timestamp for deletion on a per-item basis
No extra cost and does not use WCU / RCU
Helps reduce storage and manage the table size over time

Throttling

If we exceed provisioned RCUs or WCUs, we get "ProvisionedThroughputExceededException"
Reasons:
- Hot Keys – one partition key is being read too many times
- Hot Partitions
- Very large items(RCU and WCU depends on size of items)
Solutions:
- Exponential backoff when exception is encountered (already in SDK)
- Distribute partition keys as much as possible
- If RCU issue, we can use DynamoDB Accelerator (DAX)

DynamoDB Accelerator - DAX

Fully Managed in-memory cache for DynamoDB
10x performance improvement – single- digit millisecond latency to microseconds latency – when accessing your DynamoDB tables
Secure, highly scalable & highly available
Can be a read-through cache and a write-through cache
Used to improve READ and WRITE performance
You do not need to modify application logic, since DAX is compatible with existing DynamoDB API calls
Difference with ElastiCache at the CCP level:
- DAX is only used for and is integrated with DynamoDB
- ElastiCache can be used for other databases

DynamoDB Streams

Captures a time-ordered sequence of item-level modifications in any DynamoDB table and stores this information in a log for up to 24 hours
Can configure the information that is written to the stream:
- KEYS_ONLY — Only the key attributes of the modified item
- NEW_IMAGE — The entire item, as it appears after it was modified
- OLD_IMAGE — The entire item, as it appeared before it was modified
- NEW_AND_OLD_IMAGES — Both the new and the old images of the item

Global Tables

Make a DynamoDB table accessible with low latency in multiple-regions
Active-Active replication (read/write to any AWS Region)
Applications can READ and WRITE to the table in any region
Must enable DynamoDB Streams as a pre-requisite

Integration with Amazon S3

Export to S3 (must enable PITR)Import to S3

Works for any point of time in the last 35 days
Doesn’t affect the read capacity of your table
Perform data analysis on top of DynamoDB
Retain snapshots for auditing
ETL on top of S3 data before importing back into DynamoDB
Export in DynamoDB JSON or ION format

Import CSV,DynamoDB JSON or ION format
Doesn’t consume any write capacity
Creates a new table
Import errors are logged in CloudWatch Logs

Local Secondary Index (LSI)

Alternative Sort Key for your table (same Partition Key as that of base table)
The Sort Key consists of one scalar attribute (String, Number, or Binary)
Up to 5 Local Secondary Indexes per table
Must be defined at table creation time
Attribute Projections – can contain some or all the attributes of the base table (KEYS_ONLY, INCLUDE, ALL)

Global Secondary Index (GSI)

Alternative Primary Key (HASH or HASH+RANGE) from the base table
Speed up queries on non-key attributes
The Index Key consists of scalar attributes (String, Number, or Binary)
Attribute Projections – some or all the attributes of the base table (KEYS_ONLY, INCLUDE, ALL)
Must provision RCUs & WCUs for the index
Can be added/modified after table creation

Indexes and Throttling

Global Secondary Index (GSI):
- If the writes are throttled on the GSI, then the main table will be throttled!
- Even if the WCU on the main tables are fine
- Choose your GSI partition key carefully!
- Assign your WCU capacity carefully!
Local Secondary Index (LSI):
- Uses the WCUs and RCUs of the main table
- No special throttling considerations

CloudFront Functions

Lightweight functions written in JavaScript
For high-scale, latency-sensitive CDN customizations
Sub-ms startup times, millions of requests/second
Used to change Viewer requests and responses:
- Viewer Request: after CloudFront receives a request from a viewer
- Viewer Response: before CloudFront forwards the response to the viewer
Native feature of CloudFront (manage code entirely within CloudFront)

Amazon API Gateway

Fully managed service for developers to easily create, publish, maintain, monitor, and secure APIs
Serverless and scalable
Supports RESTful APIs and WebSocket APIs
Support for security, user authentication, API throttling, API keys, monitoring...
Handle API versioning (v1, v2...)
Handle different environments (dev, test, prod...)
Handle security (Authentication and Authorization)
Create API keys, handle request throttling
Swagger / Open API import to quickly define APIs
Transform and validate requests and responses
Generate SDK and API specifications
Cache API responses

Endpoint Types

Edge-Optimized (default): For global clients
- Requests are routed through the CloudFront Edge locations (improves latency)
- The API Gateway still lives in only one region
Regional:
- For clients within the same region
- Could manually combine with CloudFront (more control over the caching strategies and the distribution)
Private:
- Can only be accessed from your VPC using an interface VPC endpoint (ENI)
- Use a resource policy to define access

Caching

You can add caching to API calls by provisioning an Amazon API Gateway cache and specifying its size in gigabytes
Caching allows you to cache the endpoint's response
Caching can reduce number of calls to the backend and improve latency of requests to the API

Security

User Authentication through
- IAM Roles (useful for internal applications)
- Cognito (identity for external users – example mobile users)
- Custom Authorizer (your own logic)
Custom Domain Name HTTPS security through integration with AWS Certificate Manager (ACM)
- If using Edge-Optimized endpoint, then the certificate must be in us-east-1
- If using Regional endpoint, the certificate must be in the API Gateway region
- Must setup CNAME or A-alias record in Route 53

Deployment Stages

Making changes in the API Gateway does not mean they’re effective
You need to make a "deployment" for them to be in effect
It’s a common source of confusion
Changes are deployed to "Stages" (as many as you want)
Use the naming you like for stages (dev, test, prod)
Each stage has its own configuration parameters
Stages can be rolled back as a history of deployments is kept

Stage Variables

Stage variables are like environment variables for API Gateway
Use them to change often changing configuration values
They can be used in:
- Lambda function ARN
- HTTP Endpoint
- Parameter mapping templates
Use cases:
- Configure HTTP endpoints your stages talk to (dev, test, prod…)
- Pass configuration parameters to AWS Lambda through mapping templates
Stage variables are passed to the "context" object in AWS Lambda

Integration Types

Integration Type MOCK
- API Gateway returns a response without sending the request to the backend
Integration Type HTTP / AWS (Lambda & AWS Services)
- you must configure both the integration request and integration response
- Setup data mapping using mapping templates for the request & response
Integration Type AWS_PROXY (Lambda Proxy):
- incoming request from the client is the input to Lambda
- The function is responsible for the logic of request / response
- No mapping template, headers, query string parameters… are passed as arguments
Integration Type HTTP_PROXY
- No mapping template
- The HTTP request is passed to the backend
- The HTTP response from the backend is forwarded by API Gateway

Mapping Templates (AWS & HTTP Integration)

Mapping templates can be used to modify request / responses
Rename / Modify query string parameters
Modify body content
Add headers
Uses Velocity Template Language (VTL): for loop, if etc…
Filter output results (remove unnecessary data)

Throttling

API Gateway sets a limit on a steady-state rate and a burst of request submissions against all APIs in your account
Limits:
- By default API Gateway limits the steady-state request rate to 10,000 requests per second
- The maximum concurrent requests is 5,000 requests across all APIs within an AWS account
- If you go over 10,000 requests per second or 5,000 concurrent requests you will receive a 429 Too Many Requests error response
Upon catching such exceptions, the client can resubmit the failed requests in a way that is rate limiting, while complying with the API Gateway throttling limits

Usage Plan & API Keys

Usage Plan:
- who can access one or more deployed API stages and methods
- how much and how fast they can access them
- uses API keys to identify API clients and meter access
- configure throttling limits and quota limits that are enforced on individual client
API Keys:
- alphanumeric string values to distribute to your customers
- Ex: WBjHxNtoAb4WPKBC7cGm64CBibIb24b4jt8jJHo9
- Can use with usage plans to control access
- Throttling limits are applied to the API keys
- Quotas limits is the overall number of maximum requests

Amazon Cognito

AWS Service that offers Authentication and Authorization features

Give users an identity to interact with our web or mobile application
Allows you to add user registration, sign in, and access control
Scalable and Highly available supporting millions of users
Supports standards based Identity Providers(OAuth 2.0, OIDC, SAML)
Useful in a variety of contexts:
- Keeping an active directory of Users
- Security APIs
- Providing temporary access to AWS resources

Cognito User Pools

Create a serverless database of user for your web & mobile apps
User pools help you track user device, location, and IP address, and adapt to sign-in requests of different risk levels

Features:
- Simple login: Username (or email) / password combination
- Password reset
- Email & Phone Number Verification
- Multi-factor authentication (MFA)
- Federated Identities: users from Facebook, Google, SAML…
- Feature: block users if their credentials are compromised elsewhere
- Login sends back a JSON Web Token (JWT)
Hosted Authentication UI
- Cognito has a hosted authentication UI that you can add to your app to handle sign up and sign-in workflows
- Using the hosted UI, you have a foundation for integration with social logins, OIDC or SAML
- Can customize with a custom logo, custom CSS, and the URL(not the underlying JavaScript)

Tip

Groups can be created with Amazon Cognito user pools to manage permissions to different types of users.
While accessing the application, users get authenticated by Amazon Cognito user pool.
Users in separate groups in Cognito will be mapped to separate IAM roles & receive tokens.
These tokens are validated by API Gateway for granting access to application or to backend Lambda function.

Cognito Identity Pools (Federated Identity)

Get identities for "users" so they obtain temporary AWS credentials
Your identity pool (e.g identity source) can include:
- Public Providers (Login with Amazon, Facebook, Google, Apple)
- Users in an Amazon Cognito user pool
- OpenID Connect Providers & SAML Identity Providers
- Developer Authenticated Identities (custom login server)
- Cognito Identity Pools allow for unauthenticated (guest) access
Users can then access AWS services directly or through API Gateway
- The IAM policies applied to the credentials are defined in Cognito
- They can be customized based on the user_id for fine grained control

AWS Step Functions

AWS Step Functions is used to build distributed applications as a series of steps in a visual workflow

You can quickly build and run state machines to execute the steps of your application
Features: sequence, parallel, conditions, timeouts, error handling, ...
How it works:
- Define the steps of your workflow in the JSON-based Amazon States Language. The visual console automatically graphs each step in the order of execution
- Start an execution to visualize and verify the steps of your application are operating as intended. The console highlights the real-time status of each step and provides a detailed history of every execution
- AWS Step Functions operates and scales the steps of your application and underlying compute for you to help ensure your application executes reliably under increasing demand
Can integrate with EC2, ECS, On-premises servers, API Gateway, SQS queues, etc...
Possibility of implementing human approval feature
Use cases: order fulfillment, data processing, web applications, any workflow

Task States

Do some work in your state machine
Invoke one AWS service
- Can invoke a Lambda function
- Run an AWS Batch job
- Run an ECS task and wait for it to complete
- Insert an item from DynamoDB
- Publish message to SNS, SQS
- Launch another Step Function workflow…
Run an one Activity
- EC2, Amazon ECS, on-premises
- Activities poll the Step functions for work
- Activities send results back to Step Functions

States

Choice State -Test for a condition to send to a branch (or default branch)
Fail or Succeed State - Stop execution with failure or success
Pass State - Simply pass its input to its output or inject some fixed data, without performing work.
Wait State - Provide a delay for a certain amount of time or until a specified time/date.
Map State - Dynamically iterate steps.
Parallel State - Begin parallel branches of execution.

Error Handling in Step Functions

Any state can encounter runtime errors for various reasons:
- State machine definition issues (for example, no matching rule in a Choice state)
- Task failures (for example, an exception in a Lambda function)
- Transient issues (for example, network partition events)
Use Retry (to retry failed state) and Catch (transition to failure path) in the State Machine to handle the errors instead of inside the Application Code
Predefined error codes:
- States.ALL : matches any error name
- States.Timeout: Task ran longer than TimeoutSeconds or no heartbeat received
- States.TaskFailed: execution failure
- States.Permissions: insufficient privileges to execute code
The state may report is own errors

AWS Amplify

Set of tools to get started with creating mobile and full-stack web applications
"Elastic Beanstalk for mobile and web applications"
Build web and mobile backends, and web frontend UIs
Must-have features such as data storage, authentication, storage, and machine-learning, all powered by AWS services
Front-end libraries with ready-to-use components for React.js, Vue, React Native, iOS, Android, Flutter, etc…
Incorporates AWS best practices to for reliability, security, scalability
Build and deploy with the Amplify CLI or Amplify Studio
AWS Amplify Hosting is a fully managed CI/CD and hosting service for fast, secure, and reliable static and server-side rendered apps

Components

Amplify Studio
- Visually build a full-stack app, both front-end UI and a backend
- Use the visual interface to define a data model, user authentication, and file storage without backend expertise
- Easily add AWS services not available within Amplify Studio using the AWS Cloud Development Kit (CDK)
Amplify CLI - Configure an Amplify backend With a guided CLI workflow
Amplify Libraries - Connect your app to existing AWS Services (Cognito, S3 and more)
Amplify Hosting - Host secure, reliable, fast web apps or websites via the AWS content delivery network

AWS AppSync

AWS AppSync is a fully managed service that makes it easy to develop GraphQL APIs
Applications can securely access, manipulate, and receive real-time updates from multiple data sources such as databases or APIs
AWS AppSync automatically scales a GraphQL API execution engine up and down to meet API request volumes
Uses GraphQL, a data language that enables client apps to fetch, change and subscribe to data from servers
AWS AppSync lets you specify which portions of your data should be available in a real-time manner using GraphQL Subscriptions
AWS AppSync supports AWS Lambda, Amazon DynamoDB, and Amazon Elasticsearch
Server-side data caching capabilities reduce the need to directly access data sources
AppSync is fully managed and eliminates the operational overhead of managing cache clusters

AWS Batch

Fully managed batch processing at any scale
Efficiently run 100,000s of computing batch jobs on AWS
A "batch" job is a job with a start and an end (opposed to continuous)
Batch will dynamically launch EC2 instances or Spot Instances
AWS Batch provisions the right amount of compute / memory
You submit or schedule batch jobs and AWS Batch does the rest!
Batch jobs are defined as Docker images and run on ECS
Helpful for cost optimizations and focusing less on the infrastructure

Batch VS Lambda

LambdaBatch

Time limit
Limited runtimes
Limited temporary disk space
Serverless

No time limit
Any runtime as long as it’s packaged as a Docker image
Rely on EBS / instance store for disk space
Relies on EC2 (can be managed by AWS)

Amazon Lightsail

Virtual servers, storage, databases, and networking
Low & predictable pricing
Simpler alternative to using EC2, RDS, ELB, EBS, Route 53...
Great for people with little cloud experience!
Can setup notifications and monitoring of your Lightsail resources
Use cases:
- Simple web applications (has templates for LAMP, Nginx, MEAN, Node.js...)
- Websites (templates for WordPress, Magento, Plesk, Joomla)
- Dev /Test environment
Has high availability but no auto-scaling, limited AWS integrations

Data & Analytics

Amazon Athena

Serverless query service to analyze data stored in Amazon S3
Uses standard SQL language to query the files
Supports CSV, TSV, JSON, ORC, Avro, and Parquet(builtonPresto)
Pricing: $5.00 per TB of data scanned
Commonly used with Amazon Quicksight for repor ting/dashboards
Use compressed or columnar data for cost-savings (less scan)
Use cases: Business intelligence / analytics / reporting, analyze & query VPC Flow Logs, ELB Logs, CloudTrail trails, etc...

Federated Query

Allows you to run SQL queries across data stored in relational, non-relational, object, and custom data sources (AWS or on-premises)
Uses Data Source Connectors that run on AWS Lambda to run Federated Queries (e.g., CloudWatch Logs, DynamoDB, RDS, ...)
Store the results back in Amazon S3

Optimizing Athena for Performance

Partition your data
Bucket your data – bucket the data within a single partition
Use Compression – AWS recommend using either Apache Parquet or Apache ORC
Optimize file sizes
Optimize columnar data store generation – Apache Parquet and Apache ORC are popular columnar data stores
Optimize ORDER BY and Optimize GROUP BY
Use approximate functions
Only include the columns that you need

Redshift

Redshift is based on PostgreSQL, but it’s not used for OLTP
It’s OLAP – online analytical processing (analytics and data warehousing)
Load data once every hour, not every second
10x better performance than other data warehouses, scale to PBs of data
Columnar storage of data (instead of row based)
Massively Parallel Query Execution (MPP), highly available
Pay as you go based on the instances provisioned
Has a SQL interface for performing the queries
BI tools such as AWS Quicksight or Tableau integrate with it
vs Athena: faster queries / joins / aggregations thanks to indexes

Cluster

Leader node: for query planning, results aggregation
Compute node: for performing the queries, send results to leader
You provision the node size in advance
You can used Reserved Instances for cost savings

Snapshots & DR

Redshift has no "Multi-AZ" mode
Snapshots are point-in-time backups of a cluster, stored internally in S3
Snapshots are incremental (only what has changed is saved)
You can restore a snapshot into a new cluster
Automated: every 8 hours, every 5 GB, or on a schedule. Set retention between 1 to 35 days
Manual: snapshot is retained until you delete it
You can configure Amazon Redshift to automatically copy snapshots (automated or manual) of a cluster to another AWS Region

Spectrum

Query data that is already in S3 without loading it
Must have a Redshift cluster available to start the query
The query is then submitted to thousands of Redshift Spectrum nodes

RedShift Use Cases

Perform complex queries on massive collections of structured and semi-structured data and get fast performance
Frequently accessed data that needs a consistent, highly structured format
Use Spectrum for direct access of S3 objects in a data lake
Managed data warehouse solution with:
- Automated provisioning, configuration and patching
- Data durability with continuous backup to S3
- Scales with simple API calls
- Exabyte scale query capability

OpenSearch Service

Amazon OpenSearch is successor to Amazon ElasticSearch

In DynamoDB, queries only exist by primary key or indexes... -> With OpenSearch, you can search any field, even partially matches
It’s common to use OpenSearch as a complement to another database
Search, visualize, and analyze text and unstructured data
Deploy nodes and replicas across AZs
OpenSearch requires a cluster of instances (not serverless)
Does not support SQL (it has its own query language)
Ingestion from Kinesis Data Firehose, AWS IoT, and CloudWatch Logs
Security through Cognito & IAM, KMS encryption,TLS
Comes with OpenSearch Dashboards (visualization)

Service Deployment

Clusters are created (Management Console, API, or CLI)
Clusters are also known as OpenSearch Service domains
You specify the number of instances and instance types
Storage options include UltraWarm or Cold storage

OpenSearch in an Amazon VPC

Clusters can be deployed in a VPC for secure intra-VPC communications
VPN or proxy required to connect from the internet (public domains are directly accessible)
Cannot use IP-based access policies
Limitations of VPC deployments:
- You can’t switch from VPC to a public endpoint. The reverse is also true
- You can’t launch your domain within a VPC that uses dedicated tenancy
- After you place a domain within a VPC, you can’t move it to a different VPC, but you can change the subnets and security group settings

The ELK Stack

ELK stands for Elasticsearch, Logstash, and Kibana
This is a popular combination of projects
Aggregate logs from systems and applications, analyze these logs, and create visualizations
Use cases include:
- Visualizing application and infrastructure monitoring data
- Troubleshooting
- Security analytics

Access Control

Resource-based policies – often called a domain access policy
Identity-based policies – attached to users or roles (principals)
IP-based policies – Restrict access to one or more IP addresses or CIDR blocks
Fine-grained access control – Provides:
- Role-based access control
- Security at the index, document, and field level
- OpenSearch Dashboards multi-tenancy
- HTTP basic authentication for OpenSearch and OpenSearch Dashboards

Tip

Authentication options include:

Federation using SAML to on-premises directories
Amazon Cognito and social identity providers

Best Practices

Deploy OpenSearch data instances across three Availability Zones (AZs) for the best availability
Provision instances in multiples of three for equal distribution across AZs
If three AZs are not available use two AZs with equal numbers of instances
Use three dedicated master nodes
Configure at least one replica for each index
Apply restrictive resource-based access policies to the domain (or use fine-grained access control)
Create the domain within an Amazon VPC
For sensitive data enable node-to-node encryption and encryption at rest

Amazon EMR

EMR stands for "Elastic MapReduce"

EMR helps creating Hadoop clusters (Big Data) to analyze and process vast amount of data
The clusters can be made of hundreds of EC2 instances
Also supports Apache Spark, HBase, Presto, Flink...
EMR takes care of all the provisioning and configuration
Auto-scaling and integrated with Spot instances
Use cases: data processing, machine learning, web indexing, big data...

Node types & purchasing

Master Node: Manage the cluster, coordinate, manage health – long running
Core Node: Run tasks and store data – long running
Task Node (optional): Just to run tasks – usually Spot
Purchasing options:
- On-demand: reliable, predictable, won’t be terminated
- Reserved (min 1 year): cost savings (EMR will automatically use if available)
- Spot Instances: cheaper, can be terminated, less reliable
Can have long-running cluster, or transient (temporary) cluster

Amazon QuickSight

Serverless machine learning-powered business intelligence service to create interactive dashboards
Fast, automatically scalable, embeddable, with per-session pricing
Use cases:
- Business analytics
- Building visualizations
- Perform ad-hoc analysis
- Get business insights using data
Integrated with RDS, Aurora, Athena, Redshift, S3...
In-memory computation using SPICE engine if data is imported into QuickSight
Enterprise edition: Possibility to setup Column-Level security (CLS)

AWS Glue

Managed extract, transform, and load (ETL) service
Useful to prepare and transform data for analytics
AWS Glue runs the ETL jobs on a fully managed, scale-out Apache Spark environment
AWS Glue discovers data and stores the associated metadata(e.g. table definition and schema) in the AWS Glue Data Catalog
Works with data lakes (e.g. data on S3), data warehouses (including RedShift), and data stores (including RDS or EC2 databases)
Fully serverless service
Glue Job Bookmarks: prevent re-processing old data
Glue Elastic Views:
- Combine and replicate data across multiple data stores using SQL
- No custom code, Glue monitors for changes in the source data, serverless
- Leverages a "virtual table" (materialized view)
Glue DataBrew: clean and normalize data using pre-built transformation
Glue Studio: new GUI to create, run and monitor ETL jobs in Glue
Glue Streaming ETL (built on Apache Spark Structured Streaming): compatible with Kinesis Data Streaming, Kafka, MSK (managed Kafka)

Tip

You can use a crawler to populate the AWS Glue Data Catalog with tables
A crawler can crawl multiple data stores in a single run
Upon completion, the crawler creates or updates one or more tables in your Data Catalog

AWS Lake Formation

Data lake = central place to have all your data for analytics purposes
Fully managed service that makes it easy to setup a data lake in days
Discover, cleanse, transform, and ingest data into your Data Lake
It automates many complex manual steps (collecting, cleansing, moving, cataloging data, ...) and de-duplicate (using ML Transforms)
Combine structured and unstructured data in the data lake
Out-of-the-box source blueprints: S3, RDS, Relational & NoSQL DB...
Fine-grained Access Control for your applications (row and column-level)
Built on top of AWS Glue

AWS Data Exchange

AWS Data Exchange is a data marketplace with over 3,000 products from 250+ providers
AWS Data Exchange supports Data Files, Data Tables, and Data APIs
Consume directly into data lakes, applications, analytics, and machine learning models
Automatically export new or updated data sets to Amazon S3
Query data tables with AWS Data Exchange for Amazon Redshift
Use AWS-native authentication and governance, AWS SDKs, and consistent API documentation

AWS Data Pipeline

AWS Data Pipeline is a managed ETL (Extract-Transform-Load) service
Process and move data between different AWS compute and storage services
Data sources can also be on-premises
Data can be processed and transformed
Results can be loaded to services such as Amazon S3, Amazon RDS, Amazon DynamoDB, and Amazon EMR

Amazon Managed Streaming for Apache Kafka (MSK)

Amazon MSK is used for ingesting and processing streaming data in real-time
Build and run Apache Kafka applications
It is a fully managed service
Provisions, configures, and maintains Apache Kafka clusters and Apache ZooKeeper nodes
Security levels include:
- VPC network isolation
- AWS IAM for control-plane API authorization
- Encryption at rest
- TLS encryption in-transit
- TLS based certificate authentication
- SASL/SCRAM authentication secured by AWS Secrets Manager

Deploying and Managing

AWS CLI

--dry-run: not actually run the commands
sts decode-authorization-message: decode long error message
http://169.254.169.254/latest/meta-data: EC2 Instance Metadata
aws sts get-session-token: create a temporary session to use MFA with CLI

AWS CLI Credentials Provider Chain

The CLI will look for credentials in this order:

Command line options – --region, --output, and --profile
Environment variables – AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN
CLI credentials file – aws configure ~/.aws/credentials on Linux / Mac & C:\Users\USERNAME\.aws\credentials on Windows
CLI configuration file – aws configure ~/.aws/config on Linux / macOS & C:\Users\USERNAME\.aws\config on Windows
Container credentials – for ECS tasks
Instance profile credentials – for EC2 Instance Profiles

AWS Cloud Development Kit (CDK)

Define your cloud infrastructure using a familiar language:
- JavaScript/TypeScript
- Python
- Java
- .NET
The code is "compiled" into a CloudFormation template (JSON/YAML)
You can therefore deploy infrastructure and application runtime code together
Great for Lambda functions
Great for Docker containers in ECS / EKS

CloudFormation

CloudFormation is a declarative way of outlining your AWS Infrastructure

Benefits

Infrastructure as code
- No resources are manually created, which is excellent for control
- The code can be version controlled
- Changes to the infrastructure are reviewed through code
Cost
- Each resources within the stack is tagged with an identifier so you can easily see how much a stack costs you
- You can estimate the costs of your resources using the CloudFormation template
- Savings strategy: In Dev, you could automation deletion of templates at 6 PM and recreated at 9 AM, safely
Productivity
- Ability to destroy and re-create an infrastructure on the cloud on the fly
- Automated generation of Diagram for your templates!
- Declarative programming (no need to figure out ordering and orchestration)
Separation of concern: create many stacks for many apps, and many layers
Don’t re-invent the wheel
- Leverage existing templates on the web!
- Leverage the documentation
Supports (almost) all AWS resources:
- You can use "custom resources" for resources that are not supported

Components

Template - The JSON or YAML text file that contains the instructions for building out the AWS environment
Stacks - The entire environment described by the template and created, updated, and deleted as a single unit
StackSets - AWS CloudFormation StackSets extends the functionality of stacks by enabling you to create, update, or delete stacks across multiple accounts and regions with a single operation
Change Sets - A summary of proposed changes to your stack that will allow you to see how those changes might impact your existing resources before implementing them

Templates

A template is a YAML or JSON template used to describe the end-state of the infrastructure you are provisioning or changing
After creating the template, you upload it to CloudFormation directly or using Amazon S3
CloudFormation reads the template and makes the API calls on your behalf
The resulting resources are called a "Stack"
Logical IDs – used to reference resources within the template
Physical IDs – identify resources outside of AWS CloudFormation templates, but only after the resources have been created

ResourcesParametersMappingsOutputsConditionsTransformDependsOnWaitCondition

The required Resources section declares the AWS resources that you want to include in the stack, such as an Amazon EC2 instance or an Amazon S3 bucket
This is a mandatory section
Resources are declared and can reference each other

Use the optional Parameters section to customize your templates
Parameters enable you to input custom values to your template each time you create or update a stack
Useful for template reuse

The optional Mappings section matches a key to a corresponding set of named values

Tip

With mappings you can set values based on a region. You can create a mapping that uses the region name as a key and contains the values you want to specify for each specific region.

The optional Outputs section declares output values that you can import into other stacks (to create cross-stack references), return in response (to describe stack calls), or view on the AWS CloudFormation console

The optional Conditions section contains statements that define the circumstances under which entities are created or configured

The optional Transform section specifies one or more macros that AWS CloudFormation uses to process your template
The transform section can be used to reference additional code stored in S3, such as Lambda code or reusable snippets of CloudFormation code
The AWS::Serverless transform specifies the version of the AWS Serverless Application Model (AWS SAM) to use
This model defines the AWS SAM syntax that you can use and how AWS CloudFormation processes it
The AWS::Include transform works with template snippets that are stored separately from the main AWS CloudFormation template

With the DependsOn attribute you can specify that the creation of a specific resource follows another
When you add a DependsOn attribute to a resource, that resource is created only after the creation of the resource specified in the DependsOn attribute

Use a WaitCondition to ensure resources are ready
You can use a wait condition for situations like the following:
- To coordinate stack resource creation with configuration actions that are external to the stack creation
- To track the status of a configuration process

Creation Policy

Use the CreationPolicy attribute when you want to wait on resource configuration actions before stack creation proceeds
You can associate the CreationPolicy attribute with a resource to prevent its status from reaching create complete until AWS CloudFormation receives a specified number of success signals or the timeout period is exceeded
To signal a resource, you can use the cfn-signal helper script or SignalResource API
AWS CloudFormation publishes valid signals to the stack events so that you track the number of signals sent
The following CloudFormation resources support creation policies:
- AWS::AutoScaling::AutoScalingGroup
- AWS::EC2::Instance
- AWS::CloudFormation::WaitCondition

Deletion Policy

With the DeletionPolicy attribute you can preserve or (in some cases) backup a resource when its stack is deleted
You specify a DeletionPolicy attribute for each resource that you want to control
If a resource has no DeletionPolicy attribute, AWS CloudFormation deletes the resource by default
Deletion policies can be specified as:
- DeletionPolicy=Retain – preserves the resources
- DeletionPolicy=Snapshot – takes a snapshot (e.g. for EC2, ElastiCache, RDS)
- DeletionPolicy=Delete – default, attempts to delete the resources

UpdatePolicy and UpdateReplacePolicy

Use the UpdatePolicy attribute to specify how AWS CloudFormation handles updates to the following resources:
- AWS::AutoScaling::AutoScalingGroup
- AWS::ElastiCache::ReplicationGroup
- AWS::Elasticsearch::Domain
- AWS::Lambda::Alias
Use the UpdateReplacePolicy attribute to retain or (in some cases) backup the existing physical instance of a resource when it is replaced during a stack update operation

Custom Resources

Custom resources enable you to write custom provisioning logic in templates that AWS CloudFormation runs anytime you create, update (if you changed the custom resource), or delete stacks

By creating a custom resource backed by an AWS Lambda function, you have the flexibility to define and orchestrate multiple AWS services as a single resource within your CloudFormation stack.
The custom Lambda function can handle the logic for creating and deleting the resources across multiple services in the desired order and configuration.

Deploying CloudFormation templates

Manual way:
- Editing templates in the CloudFormation Designer
- Using the console to input parameters, etc
Automated way:
- Editing templates in a YAML file
- Using the AWS CLI (Command Line Interface) to deploy the templates
- Recommended way when you fully want to automate your flow

Intrinsic Functions

Fn::Ref(!Ref)
- Returns the value of the specified parameter or resource
- When you specify a parameter’s logical name, it returns the value of the parameter
- When you specify a resource’s logical name, it returns a value that you can typically use to refer to that resource, such as a physical ID
Fn::GetAtt
- Returns the value of an attribute from a resource in the template
- Full syntax: Fn::GetAtt: [ logicalNameOfResource, attributeName ]
- Short form: !GetAtt logicalNameOfResource.attributeName
Fn::FindInMap
- Returns the value corresponding to keys in a two-level map that is declared in the Mappings section
- Full syntax: Fn::FindInMap: [ MapName, TopLevelKey, SecondLevelKey ]
- Short form: !FindInMap [ MapName, TopLevelKey, SecondLevelKey ]
Fn::ImportValue
Fn::Join
Fn::Sub
Condition Functions (Fn::If, Fn::Not, Fn::Equals, Fn::And, Fn::Or)
...

🔗Full List

Rollbacks and Stack Creation Failures

Stack creation failures:
- By default everything will be deleted
- Can modify the OnFailure attribute for a stack
- OnFailure must be one of:
  - DO_NOTHING – leaves the resources in place (good for troubleshooting)
  - ROLLBACK – rolls the stack back
  - DELETE – deletes the resources
Stack update failures:
- A stack goes into the UPDATE_ROLLBACK_FAILED state when AWS CloudFormation cannot roll back all changes during an update
- The stack will automatically roll back to the previous known working state
- When a stack is in the UPDATE_ROLLBACK_FAILED state, you can continue to roll it back to a working state (UPDATE_ROLLBACK_COMPLETE)
- You can't update a stack that is in the UPDATE_ROLLBACK_FAILED state
- However, if you can continue to roll it back, you can return the stack to its original settings and then try to update it again

Stacks

Deployed resources based on templates
You create, update and delete stacks using templates
Stacks are deployed through the Management Console, CLI or APIs
Stack creation errors:
- Automatic rollback on error is enabled by default
- You will be charged for resources provisioned even if there is an error

Nested stacks

cfn-console-nested-stacks

Nested stacks are stacks as part of other stacks
They allow you to isolate repeated patterns / common components in separate stacks and call them from other stacks
Example:
- Load Balancer configuration that is re-used
- Security Group that is re-used
Nested stacks are considered best practice
To update a nested stack, always update the parent (root stack)

Cross vs Nested Stacks

Cross StacksNested Stacks

Helpful when stacks have different lifecycles
Use Outputs Export and Fn::ImportValue
When you need to pass export values to many stacks (VPC Id, etc...)

Helpful when components must be re-used
Ex: re-use how to properly configure an Application Load Balancer
The nested stack only is important to the higher level stack (it’s not shared)

StackSets

Create, update, or delete stacks across multiple accounts and regions with a single operation
An administrator account is the AWS account in which you create StackSets
A target account is the account into which you create, update, or delete one or more stacks in your stack set
When you update a stack set, all associated stack instances are updated throughout all accounts and regions
StackSets can be targeted to specific organizational units (OUs) within AWS Organizations

Tip

🔗TroubleShooting

Helper Script(cfn-init)

cfn-init is a helper script that can be used to retrieve and interpret resource metadata
The cfn-init helper script reads template metadata from the AWS::CloudFormation::Init key and acts accordingly to:
- Fetch and parse metadata from AWS CloudFormation
- Install packages
- Write files to disk
- Enable/disable and start/stop services
Logs go to /var/log/cfn-init.log
To install the applications the UserData property and Metadata property can be added to a template
The cfn-signal helper script signals AWS CloudFormation to indicate whether Amazon EC2 instances have been successfully created or updated
After installing software on EC2 instances, you can signal AWS CloudFormation when those software applications are ready
You use the cfn-signal script in conjunction with a CreationPolicy or an Auto Scaling group with a WaitOnResourceSignals update policy
In the UserData property, the template runs the cfn-signal script to send a success signal with an exit code if all the services are configured and started successfully
Troubleshooting errors:
- Make sure the AMI has the CloudFormation helper scripts included
- Check that the cfn-init and cfn-signal commands have run successfully

CloudFormation Drift

CloudFormation allows you to create infrastructure
But it doesn’t protect you against manual configuration changes
How do we know if our resources have drifted? -> Cloudformation drift

AWS Elastic Beanstalk

Overview

Elastic Beanstalk is a developer centric view of deploying an application on AWS
We still have full control over the configuration
Platform as a Service (PaaS)
- Automatically handles capacity provisioning, load balancing, scaling, application health monitoring, instance configuration, ...
- Just the application code is the responsibility of the developer
Beanstalk is free but you pay for the underlying instances
Supports Java, .NET, PHP, Node.js, Python, Ruby, Go, and Docker web applications
Supports the following languages and development stacks:
- Apache Tomcat for Java applications
- Apache HTTP Server for PHP applications
- Apache HTTP Server for Python applications
- Nginx or Apache HTTP Server for Node.js applications
- Passenger or Puma for Ruby applications
- Microsoft IIS 7.5, 8.0, and 8.5 for .NET applications
- Java SE
- Docker
- Go

Components

Application:
- Contain environments, environment configurations, and application versions
- You can have multiple application versions held within an application
Application Version:
- A specific reference to a section of deployable code
- The application version will point typically to an Amazon S3 bucket containing the code
Environment:
- An application version that has been deployed on AWS resources
- The resources are configured and provisioned by AWS Elastic Beanstalk
- The environment is comprised of all the resources created by Elastic Beanstalk and not just an EC2 instance with your uploaded code

Web Servers and Workers

Determines how Elastic Beanstalk provisions resources based on what the application is designed to do
Consists of Web Servers and Workers:
- Web servers are standard applications that listen for and then process HTTP requests, typically over port 80
- Workers are specialized applications that have a background processing task that listens for messages on an Amazon SQS queue
- Workers should be used for long-running tasks

Deployment Options for Updates

All at once (deploy all in one go)
- Deploys the new version to all instances simultaneously
- All of your instances are out of service while the deployment takes place
- Fastest deployment
- Good for quick iterations in development environment
- You will experience an outage(downtime) while the deployment is taking place - not ideal for mission-critical systems
- If the update fails, you need to roll back the changes by re- deploying the original version to all of your instances
- No additional cost
Rolling
- Update a few instances at a time (batch), and then move onto the next batch once the first batch is healthy (downtime for 1 batch at a time)
- Application is running both versions simultaneously
- Each batch of instances is taken out of service while the deployment takes place
- Your environment capacity will be reduced by the number of instances in a batch while the deployment takes place
- Not ideal for performance-sensitive systems
- If the update fails, you need to perform an additional rolling update to roll back the changes.
- No additional cost
- Long deployment time
Rolling with additional batches
- Like Rolling but launches new instances in a batch ensuring that there is full availability
- Application is running at capacity
- Can set the batch size
- Application is running both versions simultaneously
- Small additional cost
- Additional batch is removed at the end of the deployment
- Longer deployment
- Good for production environments
Immutable
- Launches new instances in a new ASG and deploys the version update to these instances before swapping traffic to these instances once healthy
- Zero downtime
- New code is deployed to new instances using an ASG
- High cost as double the number of instances running during updates
- Longest deployment
- Quick rollback in case of failures
- Great for production environments
Blue / Green
- Not a "direct feature" of Elastic Beanstalk
- Zero downtime and release facility
- Create a new "stage" environment and v1 deploy v2 there
- The new environment (green) can be validated independently and roll back if issues
- Route 53 can be setup using weighted policies to redirect a little bit of traffic to the stage environment
- Using Beanstalk "swap URLs" when done with the environment test

Traffic Splitting

Canary Testing
New application version is deployed to a temporary ASG with the same capacity
A small % of traffic is sent to the temporary ASG for a configurable amount of time
Deployment health is monitored
If there’s a deployment failure, this triggers an automated rollback (very quick)
No application downtime
New instances are migrated from the temporary to the original ASG
Old application version is then terminated

Tip

Miners would take a canary into the mines with them to determine if the air held dangerous levels of toxic gases. The idea was that the bird had a faster breathing rate than humans and would show signs of the presence of toxic gases sooner than humans would. If the canary died, the miners would know that they needed to evacuate the mine.

Extensions

A zip file containing our code must be deployed to Elastic Beanstalk
All the parameters set in the UI can be configured with code using files
Requirements:
- in the .ebextensions/ directory in the root of source code
- YAML / JSON format
- .config extensions (example: logging.config)
- Able to modify some default settings using: option_settings
- Ability to add resources such as RDS, ElastiCache, DynamoDB, etc...
Resources managed by .ebextensions get deleted if the environment goes away

Cloning

Clone an environment with the exact same configuration
Useful for deploying a "test" version of your application
All resources and configuration are preserved:
- Load Balancer type and configuration
- RDS database type (but the data is not preserved)
- Environment variables
After cloning an environment, you can change settings

Migration: Load Balancer

After creating an Elastic Beanstalk environment, you cannot change the Elastic Load Balancer type (only the configuration)
To migrate:
1. create a new environment with the same configuration except LB (can’t clone)
2. deploy your application onto the new environment
3. perform a CNAME swap or Route 53 update

AWS Cloud9

AWS Cloud9 is a cloud IDE (Integrated Development Environment) for writing, running and debugging code
Allows for code collaboration in real-time (pair programming)

AWS Systems Manager (SSM)

Helps you manage your EC2 and On-Premises systems at scale
Another Hybrid AWS service
Get operational insights about the state of your infrastructure
Suite of 10+ products
Most important features are:
- Patching automation for enhanced compliance
- Run commands across an entire fleet of servers
- Store parameter configuration with the SSM Parameter Store
- Works for both Windows and Linux OS

AWS OpsWorks

Chef & Puppet help you perform server configuration automatically, or repetitive actions
They work great with EC2 & On-PremisesVM
AWS OpsWorks = Managed Chef & Puppet
It’s an alternative to AWS SSM
Only provision standard AWS resources: EC2 Instances, Databases, Load Balancers, EBS volumes...
Updates include patching, updating, backup, configuration and compliance management

CI/CD

AWS CodeCommit

Source-control service that hosts Git-based repositories -> Makes it easy to collaborate with others on code
The code changes are automatically versioned
Benefits:
- Fully managed
- Scalable & highly available
- Private, Secured, Integrated with AWS
Authentication
- SSH Keys – AWS Users can configure SSH keys in their IAM Console
- HTTPS – with AWS CLI Credential helper or Git Credentials for IAM user
Authorization
- IAM policies to manage users/roles permissions to repositories
Encryption
- Encrypted in transit (can only use HTTPS or SSH – both secure)
- Repositories are automatically encrypted at rest using AWS KMS
Cross-account Access
- Do NOT share your SSH keys or your AWS credentials
- Use an IAM Role in your AWS account and use AWS STS (AssumeRole API)

AWS CodeBuild

A fully managed continuous integration (CI) service
Continuous scaling (no servers to manage or provision – no build queue)
Compile source code, run tests, produce software packages, ...
Alternative to other build tools (e.g., Jenkins)
Charged per minute for compute resources (time it takes to complete the builds)
Leverages Docker under the hood for reproducible builds
Use prepackaged Docker images or create your own custom Docker image
Security:
- Integration with KMS for encryption of build artifacts
- IAM for CodeBuild permissions,and VPC for network security
- AWS CloudTrail for API calls logging
Build instructions: Code file buildspec.yml or insert manually in Console
Output logs can be stored in Amazon S3 & CloudWatch Logs
Use CloudWatch Metrics to monitor build statistics
Use CloudWatch Events to detect failed builds and trigger notifications
Use CloudWatch Alarms to notify if you need "thresholds" for failures
Build Projects can be defined within CodePipeline or CodeBuild

buildspec.yml

buildspec.yml file must be at the root of your code
env – define environment variables
- variables – plaintext variables
- parameter-store – variables stored in SSM Parameter Store
- secrets-manager – variables stored in AWS Secrets Manager
phases – specify commands to run:
- install – install dependencies you may need for your build
- pre_build – final commands to execute before build
- Build – actual build commands
- post_build – finishing touches (e.g., zip output)
artifacts – what to upload to S3 (encrypted with KMS)
cache – files to cache (usually dependencies) to S3 for future build speedup

AWS CodeDeploy

We want to deploy our application automatically
Works with EC2 Instances
Works with On-Premises Servers
Hybrid service
Servers / Instances must be provisioned and configured ahead of time with the CodeDeploy Agent

Primary Components

Application – a unique name functions as a container (revision, deployment configuration, ...)
Compute Platform – EC2/On-Premises, AWSLambda, or AmazonECS
Deployment Configuration – a set of deployment rules for success/failure
- EC2/On-premises – specify the minimum number of healthy instances for the deployment
- AWS Lambda or Amazon ECS – specify how traffic is routed to your updated versions
Deployment Group - group of tagged EC2 instances (allows to deploy gradually, or dev, test, prod...)
Deployment Type – method used to deploy the application to a Deployment Group
- In-place Deployment – supports EC2/On-Premises
- Blue/Green Deployment – suppor ts EC2 instances only, AWS Lambda, and Amazon ECS
IAM Instance Profile – give EC2 instances the permissions to access both S3 / GitHub
Application Revision – application code + appspec.yml file
Service Role – an IAM Role for CodeDeploy to perform operations on EC2 instances, ASGs, ELBs...
Target Revision – the most recent revision that you want to deploy to a Deployment Group

appspec.yml

files – how to source and copy from S3 / GitHub to filesystem
- source
- destination
hooks – set of instructions to do to deploy the new version (hooks can have timeouts), the order is:
- ApplicationStop
- DownloadBundle
- BeforeInstall
- Install
- AfterInstall
- ApplicationStart
- ValidateService <- important!!
- BeforeAllowTraffic
- AllowTraffic
- AfterAllowTraffic

Deployment Configuration

Configurations:
- One At A Time – one EC2 instance at a time, if one instance fails then deployment stops
- Half At A Time – 50%
- All At Once – quick but no healthy host, downtime. Good for dev
- Custom – min. healthy host = 75%
Failures:
- EC2 instances stay in "Failed" state
- New deployments will first be deployed to failed instances
- To rollback, redeploy old deployment or enable automated rollback for failures
Deployment Groups:
- A set of tagged EC2 instances
- Directly to an ASG
- Mix of ASG / Tags so you can build deployment segments
- Customization in scripts with DEPLOYMENT_GROUP_NAME environment variables

Deploy to an ASG

In-place Deployment
- Updates existing EC2 instances
- Newly created EC2 instances by an ASG will also get automated deployments
Blue/Green Deployment
- A new Auto-Scaling Group is created (settings are copied)
- Choose how long to keep the old EC2 instances (old ASG)
- Must be using an ELB

AWS CodePipeline

Orchestrate the different steps to have the code automatically pushed to production
- Code => Build => Test => Provision => Deploy
- Basis for CICD (Continuous Integration & Continuous Delivery)
Benefits:
- Fully managed, compatible with CodeCommit, CodeBuild, CodeDeploy, ElasticBeanstalk, CloudFormation, GitHub, 3rd-party services (GitHub...) & custom plugins...
- Fast delivery & rapid updates
Artifacts
- Each pipeline stage can create artifacts
- Artifacts stored in an S3 bucket and passed on to the next stage

AWS CodeArtifact

Software packages depend on each other to be built (also called code dependencies), and new ones are created
Storing and retrieving these dependencies is called artifact management
Traditionally you need to setup your own artifact management system
CodeArtifact is a secure, scalable, and cost-effective artifact management for software development
Works with common dependency management tools such as Maven, Gradle, npm, yarn, twine, pip, and NuGet
Developers and CodeBuild can then retrieve dependencies straight from CodeArtifact

AWS CodeStar

An integrated solution that groups: GitHub, CodeCommit, CodeBuild, CodeDeploy, CloudFormation, CodePipeline, CloudWatch, ...
Quickly create "CICD-ready" projects for EC2, Lambda, Elastic Beanstalk
Supported languages: C#, Go, Java, Node.js, PHP, Python, Ruby
Issue tracking integration with JIRA / GitHub Issues
Ability to integrate with Cloud9 to obtain a web IDE (not all regions)
One dashboard to view all your components
Free service, pay only for the underlying usage of other services
Limited Customization

Amazon CodeGuru

An ML-powered service for automated code reviews and application performance recommendations
Provides two functionalities
- CodeGuru Reviewer: automated code reviews for static code analysis (development)
- CodeGuru Profiler: visibility/recommendations about application performance during runtime (production)

Cloud Integration & Messaging

Amazon SQS

Note

SQS queue cannot be put in a VPC subnet
SQS does not have a security group

Standard Queue

Oldest offering, Fully managed service, used to decouple applications

Attributes:
- Unlimited throughput, unlimited number of messages in queue
- Default retention of messages: 4 days; maximum of 14 days
- Low latency (<10 ms on publish and receive)
- Limitation of 256KB per message sent
Can have duplicate messages (at least once delivery, occasionally)
Can have out of order messages (best effort ordering)

Producing Messages

Produced to SQS using the SDK (SendMessage API)
The message is persisted in SQS until a consumer deletes it
Message retention: default 4 days, up to 14 days
SQS standard: unlimited throughput

Consuming Messages

Consumers (running on EC2 instances, servers, or AWS Lambda)...
Poll SQS for messages (receive up to 10 messages at a time)
Process the messages (example: insert the message into an RDS database)
Delete the messages using the DeleteMessage API

Multiple EC2 Instances Consumers

Consumers receive and process messages in parallel
At least once delivery
Best-effort message ordering
Consumers delete messages after processing them
We can scale consumers horizontally to improve throughput of processing

Security

Encryption:
- In-flight encryption using HTTPS API
- At-rest encryption using KMS keys
- Client-side encryption if the client wants to perform encryption/decryption itself
Access Controls: IAM policies to regulate access to the SQS API
SQS Access Policies (similar to S3 bucket policies)
- Useful for cross-account access to SQS queues
- Useful for allowing other services (SNS, S3...) to write to an SQS queue

Message Visibility Timeout

After a message is polled by a consumer, it becomes invisible to other consumers
By default, the "message visibility timeout" is 30 seconds -> That means the message has 30 seconds to be processed
Min: 0 seconds; Max: 12 hours
After the message visibility timeout is over, the message is "visible" in SQS
If a message is not processed within the visibility timeout, it will be processed twice
A consumer could call the ChangeMessageVisibility API to get more time
If visibility timeout is high (hours), and consumer crashes, re-processing will take time
If visibility timeout is too low (seconds), we may get duplicates

Dead Letter Queue

If a consumer fails to process a message within the Visibility Timeout... the message goes back to the queue!
We can set a threshold of how many times a message can go back to the queue
After the Maximum Receives threshold is exceeded, the message goes into a dead letter queue (DLQ)
The main task of a dead-letter queue is handling message failure -> Useful for debugging!
It is not a queue type, it is a standard or FIFO queue that has been specified as a dead-letter queue in the configuration of another standard or FIFO queue
- DLQ of a FIFO queue must also be a FIFO queue
- DLQ of a Standard queue must also be a Standard queue
Make sure to process the messages in the DLQ before they expire: Good to set a retention of 14 days in the DLQ

Delay Queue

Delay a message (consumers don’t see it immediately) up to 15 minutes
Default is 0 seconds (message is available right away)
Can set a default at queue level
Can override the default on send using the DelaySeconds parameter

Long Polling VS Short Polling

Long polling waits for the WaitTimeSeconds and eliminates empty responses
Short polling checks a subset of servers and may not return all messages

When a consumer requests messages from the queue, it can optionally "wait" for messages to arrive if there are none in the queue -> This is called Long Polling
LongPolling decreases the number of API calls made to SQS while increasing the efficiency and latency of your application
The wait time can be between 1 sec to 20 sec (20 sec preferable)
Long Polling is preferable to Short Polling
Long polling can be enabled at the queue level or at the API level using WaitTimeSeconds
SQS Short polling returns immediately (even if the message queue is empty)

Extended Client

Message size limit is 256KB, how to send large messages, e.g. 1GB? => Using the SQS Extended Client (Java Library)
Producer send small metadata message to SQS Queue, at the meantime send large message to S3
Consumer retrieve small metadata message from SQS Quese, and retrieve large message from S3

Useful APIs

CreateQueue (MessageRetentionPeriod), DeleteQueue
PurgeQueue: delete all the messages in queue
SendMessage (DelaySeconds), ReceiveMessage, DeleteMessage
MaxNumberOfMessages: default 1, max 10 (for ReceiveMessage API)
ReceiveMessageWaitTimeSeconds: Long Polling
ChangeMessageVisibility: change the message timeout

FIFO Queue

FIFO: First In First Out (ordering of messages in the queue)

Limited throughput: 300 msg/s without batching, 3000 msg/s with Exactly-once send capability (by removing duplicates)
Messages are processed in order by the consumer
FIFO queues require the Message Group ID and Message Deduplication ID parameters to be added to messages
Message Group ID:
- The tag that specifies that a message belongs to a specific message group Messages that belong to the same message group are guaranteed to be processed in a FIFO manner
- You can have as many consumers as MessageGroupID for you FIFO queues
Message Deduplication ID:
- The token used for deduplication of messages within the deduplication interval

FIFO – Deduplication

De-duplication interval is 5 minutes
Two de-duplication methods:
- Content-based deduplication: will do a SHA-256 hash of the message body
- Explicitly provide a MessageDeduplicationID

Amazon Kinesis

Kinesis = real-time big data streaming

Overview

Managed service to collect, process, and analyze real-time streaming data at any scale
Ingest real-time data such as: Application logs, Metrics, Website clickstreams, IoT telemetry data…
Kinesis Data Streams: capture, process, and store data streams
Kinesis Data Firehose: load data streams into AWS data stores
Kinesis Data Analytics: analyze data streams with SQL or Apache Flink
Kinesis Video Streams: capture, process, and store video streams

Kinesis Data Streams - Overview

Retention between 1 day to 365 days
Ability to reprocess (replay) data
Once data is inserted in Kinesis, it can’t be deleted (immutability)
Data that shares the same partition goes to the same shard (ordering)
Producers: AWS SDK, Kinesis Producer Library (KPL), Kinesis Agent
Consumers:
- Write your own: Kinesis Client Library (KCL), AWS SDK
- Managed: AWS Lambda, Kinesis Data Firehose, Kinesis Data Analytics

Kinesis Data Streams – Capacity Modes

Provisioned mode:
- You choose the number of shards provisioned, scale manually or using API
- Each shard gets 1MB/s in (or 1000 records per second)
- Each shard gets 2MB/s out (classic or enhanced fan-out consumer)
- You pay per shard provisioned per hour
On-demand mode:
- No need to provision or manage the capacity
- Default capacity provisioned (4 MB/s in or 4000 records per second)
- Scales automatically based on observed throughput peak during the last 30 days
- Pay per stream per hour & data in/out per GB

Kinesis Data Streams - Security

Control access / authorization using IAM policies
Encryption in flight using HTTPS endpoints
Encryption at rest using KMS
You can implement encryption/decryption of data on client side
VPC Endpoints available for Kinesis to access within VPC
Monitor API calls using CloudTrail

Kinesis Data Streams - Producers

Puts data records into data streams

Data record consists of:
- Sequence number (unique per partition-key within shard)
- Partition key (must specify while put records into stream)
- Data blob (up to 1 MB)
Producers:
- AWS SDK: simple producer
- Kinesis Producer Library (KPL): A Java library that helps read record from a Kinesis Data Stream with distributed applications sharing the read workload
- Kinesis Agent: monitor log files
Write throughput: 1 MB/sec or 1000 records/sec per shard
Use batching with PutRecords API to reduce costs & increase throughput

Kinesis Data Streams Consumers

Get data records from data streams and process them

AWS Lambda
Kinesis Data Analytics
Kinesis Data Firehose
Custom Consumer (AWS SDK) – Classic or Enhanced Fan-Out
Kinesis Client Library (KCL): library to simplify reading from data stream

Kinesis Data Firehose

Fully Managed Service, no administration, automatic scaling, serverless
- AWS: Redshift / Amazon S3 / ElasticSearch
- 3rd party partner: Splunk / MongoDB / DataDog / NewRelic / ...
- Custom: send to any HTTP endpoint
Pay for data going through Firehose
Near Real Time
- 60 seconds latency minimum for non full batches
- Or minimum 1MB of data at a time
Supports many data formats, conversions, transformations, compression
Supports custom data transformations using AWS Lambda
Can send failed or all data to a backup S3 bucket
Kinesis Data Firehose Destinations:
- RedShift (via an intermediate S3 bucket)
- Elasticsearch
- Amazon S3
- Splunk
- Datadog
- MongoDB
- New Relic
- HTTP Endpoint

Kinesis Data Analytics (SQL application)

Real-time analytics on Kinesis Data Streams & Firehose using SQL
Add reference data from Amazon S3 to enrich streaming data
Fully managed, no servers to provision
Automatic scaling
Pay for actual consumption rate
Output:
- Kinesis Data Streams: create streams out of the real-time analytics queries
- Kinesis Data Firehose: send analytics query results to destinations
- Lambda
Use cases:
- Time-series analytics
- Real-time dashboards
- Real-time metrics

Amazon SNS

Amazon SNS is a highly available, durable, secure, fully managed pub/sub messaging service

sns-delivery-protocols

Amazon SNS provides topics for high-throughput, push-based, many-to-many messaging
Multiple recipients can be grouped using Topics
A topic is an “access point” for allowing recipients to dynamically subscribe for identical copies of the same notification
One topic can support deliveries to multiple endpoint types
The "event publishers" only sends message to one SNS topic
As many "event subscribers"(subscriptions) as we want to listen to the SNS topic notifications
Each subscriber to the topic will get all the messages(note: new feature to filter messages)
Up to 12,500,000 subscriptions per topic, 100,000 topics limit
Endpoints include:
- Amazon SQS queues
- AWS Lambda functions
- HTTP(S) webhooks
- Mobile push
- SMS
- Email

How to publish

Topic Publish (using the SDK)
- Create a topic
- Create a/many subscription(s)
- Publish to the topic
Direct Publish (for mobile apps SDK)
- Create a platform application
- Create a platform endpoint
- Publish to the platform endpoint
- Works with Google GCM, Apple APNS, Amazon ADM...

Security

Encryption:
- In-flight encryption using HTTPS API
- At-rest encryption using KMS keys
- Client-side encryption if the client wants to perform encryption/decryption itself
Access Controls: IAM policies to regulate access to the SNS API
SNS Access Policies (similar to S3 bucket policies)
- Useful for cross-account access to SNS topics
- Useful for allowing other services ( S3...) to write to an SNS topic

Message Filtering

JSON policy used to filter messages sent to SNS topic’s subscriptions
If a subscription doesn’t have a filter policy, it receives every message

SQSSNSKinesis

Consumer "pull data"
Data is deleted after being consumed
Can have as many workers (consumers) as we want
No need to provision throughput
Ordering guarantees only on FIFO queues
Individual message delay capability

Push data to many subscribers
Up to 12,500,000 subscribers
Data is not persisted (lost if not delivered)
Pub/Sub
Up to 100,000 topics
No need to provision throughput
Integrates with SQS for fan- out architecture pattern
FIFO capability for SQS FIFO

Standard: pull data(2 MB per shard)
Enhanced-fan out: push data(2 MB per shard per consumer)
Possibility to replay data
Meant for real-time big data, analytics and ETL
Ordering at the shard level
Data expires after X days
Provisioned mode or ondemand capacity mode

Amazon MQ

Amazon MQ: managed Apache ActiveMQ

SQS, SNS are "cloud-native" services, and they’re using proprietary protocols from AWS.
Traditional applications running from on-premise may use open protocols such as: MQTT, AMQP, STOMP, Openwire, WSS
When migrating to the cloud, instead of re-engineering the application to use SQS and SNS, we can use Amazon MQ
Amazon MQ is a managed message broker service for RabbitMQ & ActiveMQ
Amazon MQ doesn’t "scale" as much as SQS / SNS
Amazon MQ runs on a dedicated machine (not serverless)
Amazon MQ has both queue feature (~SQS) and topic features (~SNS)

Cloud Monitoring, Logging, and Auditing

Tip

Amazon's five phases of the monitoring process:

Generation
Aggregation
Real-time processing and alarming
Storage
Analysis

CloudWatch Namespaces

A namespace is a container for CloudWatch metrics
Metrics in different namespaces are isolated from each other
You must specify a namespace for each data point you publish to CloudWatch
You can specify a namespace name when you create a metric

CloudWatch Metrics

CloudWatch provides metrics for every services in AWS
Metric is a variable to monitor (CPUUtilization, NetworkIn...)
Metrics belong to namespaces
Dimension is an attribute of a metric (instance id, environment, etc...)
Up to 10 dimensions per metric
Metrics exist within a region
Metrics cannot be deleted but automatically expire after 15 months
Metrics are uniquely defined by a name, a namespace, and zero or more dimensions
Time stamps can be up to two weeks in the past and up to two hours into the future
Can create CloudWatch dashboards of metrics

Tip

The procstat plugin enables you to collect metrics from individual processes.

EC2 Detailed monitoring

EC2 instance metrics have metrics "every 5 minutes"(free)
With detailed monitoring (for a cost), you get data "every 1 minute"
Use detailed monitoring if you want to scale faster for your ASG!
Note: EC2 Memory usage is by default not pushed (must be pushed from inside the instance as a custom metric)
Unified CloudWatch Agent sends system-level metrics for EC2 and on-premises servers

Custom Metrics

Possibility to define and send your own custom metrics to CloudWatch
Example: memory (RAM) usage, disk space, number of logged in users ...
Use API call PutMetricData
Ability to use dimensions (attributes) to segment metrics
- Instance.id
- Environment.name
Metric resolution (StorageResolution API parameter – two possible value):
- Standard: 1 minute (60 seconds)
- High Resolution: 1/5/10/30 second(s) – Higher cost
Important: Accepts metric data points two weeks in the past and two hours in the future

Example

Example of CloudWatch agent configuration file:

"append_dimensions" : {
    "ImageId" : "${aws:ImageId}"
    "InstanceId" : "${aws:InstanceId}"
}

Tip

In custom metrics, the --dimensions parameter is common. A dimension further clarifies what the metric is and what data it stores. You can have up to 30 dimensions assigned to one metric, and each dimension is defined by a name and value pair.

Retention Periods

CloudWatch metrics are retained for 15 months with the following retention periods:

Custom, subminute metrics are aggregated to minute metrics after 3 hours.
One-minute metrics are aggregated to 5-minute metrics after 15 days.
Five-minute metrics are aggregated to hourly metrics after 63 days.
Hourly metrics aggregates are discarded after 455 days(15 months).

Note

While the default metrics retention period is 15 months, the retention of logs in CloudWatch is indefinite.

CloudWatch API Actions(Metrics)

GetMetricData
- Retrieve as many as 500 different metrics in a single request
PutMetricData
- Publishes metric data points to Amazon CloudWatch
- CloudWatch associates the data points with the specified metric
- If the specified metric does not exist, CloudWatch creates the metric
GetMetricStatistics
- Gets statistics for the specified metric
- CloudWatch aggregates data points based on the length of the period that you specify
- Maximum number of data points returned from a single call is 1,440

Unified CloudWatch Agent

The unified CloudWatch agent enables you to do the following:
- Collect internal system-level metrics from Amazon EC2 instances across operating systems
- Collect system-level metrics from on-premises servers
- Retrieve custom metrics from your applications or services using the StatsD and collectd protocols
- Collect logs from Amazon EC2 instances and on-premises servers (Windows / Linux)
Agent must be installed on the server
Can be installed on:
- Amazon EC2 instances
- On-premises servers
- Linux, Windows Server, or macOS
You can use the following installation methods:
- Command line
- AWS Systems Manager
- AWS CloudFormation
Installation Process:
1. Create IAM roles to enable metric collection
2. Download the agent package
3. Update CloudWatch agent configuration file
4. Start the agent

Note

When starting the agent, you must attach the IAM role (EC2), or specify a named profile (on-premises)
You can optionally integrate with AWS Systems Manager
The procstat plugin enables you to collect metrics from individual processes. It is supported on Linux servers and on servers running Windows Server 2012 or later

CloudWatch Logs

CloudWatch Logs can send logs to:

Amazon S3 (exports)
Kinesis Data Streams
Kinesis Data Firehose
AWS Lambda
ElasticSearch

Features

CloudWatch Logs Insights can be used to interactively search and analyze log data
Monitor applications and systems using log data
Create alarms based on API activity captured by CloudTrail
By default, logs are kept indefinitely and never expire
You can adjust the retention policy for each log group, keeping the indefinite retention, or choosing a retention period between 10 years and one day
CloudWatch Logs can be used to log information about the DNS queries that Route 53 receives

Concepts

Log events – a record of activity recorded by a monitored resource
Log streams – a sequence of log events that share the same source
Log groups – groups of log streams that share retention, monitoring, and access control settings
Metric filters – used to extract metric observations from ingested events and transform them to data points in a CloudWatch metric
Retention settings – specify how long log events are kept in CloudWatch Logs

Sources

SDK, CloudWatch Logs Agent, CloudWatch Unified Agent
Elastic Beanstalk: collection of logs from application
ECS: collection from containers
AWS Lambda: collection from function logs
VPC Flow Logs:VPC specific logs
API Gateway
CloudTrail based on filter
Route53: Log DNS queries

Namespaces and Dimensions

All data in CloudWatch is recorded with a specific namespace format.
Default metrics are formatted as Service:Metric. For example, EC2:CPUUtilization.
Dimensions are key:value pairs assigned to metrics to allow for a more granular analysis of those metrics within a specific namespace.
CloudWatch allows you to create 10 dimensions for each metric.

Metric Filter & Insights

CloudWatch Logs can use filter expressions
Metric filters can be used to trigger CloudWatch alarms
CloudWatch Logs Insights can be used to query logs and add queries to CloudWatch Dashboards

S3 Export

Log data can take up to 12 hours to become available for export
The API call is CreateExportTask
Not near-real time or real-time... use Logs Subscriptions instead

Logs for EC2

By default, no logs from your EC2 machine will go to CloudWatch
You need to run a CloudWatch agent on EC2 to push the log files you want
Make sure IAM permissions are correct
The CloudWatch log agent can be setup on-premises too

Logs Agent

For virtual servers (EC2 instances, on-premises servers...)
CloudWatch Logs Agent
- Old version of the agent
- Can only send to CloudWatch Logs

CloudWatch Alarms

Alarms are used to trigger notifications for any metric
Alarms targets:
- Auto Scaling: increase or decrease EC2 instances "desired" count
- EC2 Actions: stop, terminate, reboot or recover an EC2 instance
- SNS notifications: send a notification into an SNS topic
Various options (sampling, %, max, min, etc...)
Can choose the period on which to evaluate an alarm -> Example: create a billing alarm on the CloudWatch Billing metric
Two types of alarms
- Metric alarm – performs one or more actions based on a single metric
- Composite alarm – uses a rule expression and takes into account multiple alarms
Metric alarm states:
- OK – Metric is within a threshold
- ALARM – Metric is outside a threshold
- INSUFFICIENT_DATA – not enough data
Period:
- Length of time in seconds to evaluate the metric
- High resolution custom metrics: 10 sec, 30 sec or multiples of 60 sec

Note

You can create a custom metric using the API or CLI command set-alarm-state

CloudWatch Events

Event Pattern: Intercept events from AWS services (Sources)
- Example sources: EC2 Instance Start, CodeBuild Failure, S3, Trusted Advisor
- Can intercept any API call with CloudTrail integration
Schedule or Cron (example: create an event every 4 hours)
A JSON payload is created from the event and passed to a target...
- Compute: Lambda, Batch, ECS task
- Integration: SQS, SNS, Kinesis Data Streams, Kinesis Data Firehose
- Orchestration: Step Functions, CodePipeline, CodeBuild
- Maintenance: SSM, EC2 Actions

CloudWatch Container Insight

Collect, aggregate, summarize metrics and logs from containers
Available for containers on...
- Amazon Elastic Container Service (Amazon ECS)
- Amazon Elastic Kubernetes Services (Amazon EKS)
- Kubernetes platforms on EC2
- Fargate (both for ECS and EKS)
In Amazon EKS and Kubernetes, CloudWatch Insights is using a containerized version of the CloudWatch Agent to discover containers

CloudWatch Lambda Insights

Monitoring and troubleshooting solution for serverless applications running on AWS Lambda
Collects, aggregates, and summarizes system-level metrics including CPU time, memory, disk, and network
Collects, aggregates, and summarizes diagnostic information such as cold starts and Lambda worker shutdowns
Lambda Insights is provided as a Lambda Layer

CloudWatch Contributor Insights

Analyze log data and create time series that display contributor data.
See metrics about the top-N contributors
The total number of unique contributors, and their usage.
This helps you find top talkers and understand who or what is impacting system performance.
Works for any AWS-generated logs (VPC, DNS, etc..)
For example, you can find bad hosts, identify the heaviest network users, or find the URLs that generate the most errors.
You can build your rules from scratch, or you can also use sample rules that AWS has created – leverages your CloudWatch Logs
CloudWatch also provides built-in rules that you can use to analyze metrics from other AWS services.

CloudWatch Application Insights

Provides automated dashboards that show potential problems with monitored applications, to help isolate ongoing issues
Your applications run on Amazon EC2 Instances with select technologies only (Java, .NET, Microsoft IIS Web Server, databases...)
And you can use other AWS resources such as Amazon EBS, RDS, ELB, ASG, Lambda, SQS, DynamoDB, S3 bucket, ECS, EKS, SNS, API Gateway...
Powered by SageMaker
Enhanced visibility into your application health to reduce the time it will take you to troubleshoot and repair your applications
Findings and alerts are sent to Amazon EventBridge and SSM OpsCenter

CloudWatch Synthetics

Amazon CloudWatch Synthetics allows you to create canaries that simulate customer behavior and monitor the availability

You can use Amazon CloudWatch Synthetics to create canaries, configurable scripts that run on a schedule, to monitor your endpoints and APIs.
Canaries follow the same routes and perform the same actions as a customer, which makes it possible for you to continually verify your customer experience even when you don't have any customer traffic on your applications.
By using canaries, you can discover issues before your customers do.

Amazon EventBridge

EventBridge used to be known as CloudWatch Events

EventBridge

Default event bus: generated by AWS services (CloudWatch Events)
Partner event bus: receive events from SaaS service or applications (Zendesk, DataDog, Segment, Auth0...)
Custom Event buses: for your own applications
Schema Registry: model event schema
EventBridge has a different name to mark the new capabilities
The CloudWatch Events name will be replaced with EventBridge

AWS CloudTrail

Provides governance, compliance and audit for your AWS Account
CloudTrail is enabled by default
Get an history of events / API calls made within your AWS Account by:
- Console
- SDK
- CLI
- AWS Services
Can put logs from CloudTrail into CloudWatch Logs or S3
A trail can be applied to All Regions (default) or a single Region.
If a resource is deleted in AWS, investigate CloudTrail first!
By default, management events are logged and retained for 90 days
A CloudTrail Trail logs any events to S3 for indefinite retention
Trail can be within Region or all Regions

Tip

CloudTrail logs record information about:

Who requested the action?
Where did the request originate from and when?
What was requested?
The full API response

Tip

CloudTrail Trails can be configured as multi-region trails which will log events from all regions. These logs can be delivered to a single Amazon S3 bucket. CloudTrail should have write permission on this Amazon S3 bucket to deliver logs.

CloudTrail Events

Management EventsData EventsInsights Events

Provide information about management operations that are performed on resources in your AWS account
Examples:
- Configuring security (IAM AttachRolePolicy)
- Configuring rules for routing data (Amazon EC2 CreateSubnet)
- Setting up logging (AWS CloudTrail CreateTrail)
By default, trails are configured to log management events.
Can separate Read Events (that don’t modify resources) from Write Events (that may modify resources)

Provide information about the resource operations performed on or in a resource
By default, data events are not logged (because high volume operations)
Amazon S3 object-level activity (ex: GetObject, DeleteObject, PutObject): can separate Read and Write Events
AWS Lambda function execution activity (the Invoke API)

Identify and respond to unusual activity associated with write API calls by continuously analyzing CloudTrail management events
These events are generated only when there are significant variations from normal usage patterns.

Note

CloudWatch Events can triggered based on API calls in CloudTrail
Events can be streamed to CloudWatch Logs

Example

data events:
- Amazon S3 object-level API activity (for example, GetObject, DeleteObject, and PutObject API operations)
- AWS Lambda function invocation activity (for example, InvokeFunction API operations)
- Amazon DynamoDB object-level API activity on tables (for example, PutItem, DeleteItem, and UpdateItem)
management events:
- Creating an Amazon S3 bucket
- Creating and managing AWS IAM resources
- Registering devices
- Configuring routing table rules

CloudTrail Insights

Enable CloudTrail Insights to detect unusual activity in your account:
- inaccurate resource provisioning
- hitting service limits
- Bursts of AWS IAM actions
- Gaps in periodic maintenance activity
CloudTrail Insights analyzes normal management events to create a baseline
And then continuously analyzes write events to detect unusual patterns
- Anomalies appear in the CloudTrail console
- Event is sent to Amazon S3
- An EventBridge event is generated (for automation needs)

CloudTrail Events Retention

Events are stored for 90 days in CloudTrail
To keep events beyond this period, log them to S3 and use Athena

Validate CloudTrail Log File Integrity

To determine whether a log file was modified, deleted, or unchanged after CloudTrail delivered it, you can use CloudTrail log file integrity validation.
When you enable log file integrity validation, CloudTrail creates a hash for every log file that it delivers.
Every hour, CloudTrail also creates and delivers a file that references the log files for the last hour and contains a hash of each. This file is called a digest file. Validated log files are invaluable in security and forensic investigations

Tip

Digest file is delivered in the same Amazon S3 bucket but in a separate folder as that of CloudTrail Log files. Separate security policies can be implemented on a folder consisting of digest files.

AWS X-Ray

advantages

Troubleshooting performance (bottlenecks)
Understand dependencies in a microservice architecture
Pinpoint service issues
Review request behavior
Find errors and exceptions
Are we meeting time SLA?
Where I am throttled?
Identify users that are impacted

compatibility

AWS Lambda
Elastic Beanstalk
ECS
ELB
API Gateway
EC2 Instances or any application server (even on premise)
SQS

Concepts

Segments:
- consists of tracing records for a request which a distributed application makes
- consists of multiple system defined & user-defined annotations
Subsegments: consist of remote calls made from application
Trace: segments collected together to form an end-to-end trace
Sampling: decrease the amount of requests sent to X-Ray, reduce cost
Annotations: Key Value pairs used to index traces and use with filters, consist of system-defined & user-defined data
Metadata: Key Value pairs, not indexed, not used for searching

Service Maps

AWS X-Ray Service maps can be used to create dependency trees between services across multiple AWS regions
They also detect latency between these services which can help resolve performance issues

CloudTrail vs CloudWatch vs X-Ray

CloudTrailCloudWatchX-Ray

Audit API calls made by users / services / AWS console
Useful to detect unauthorized calls or root cause of changes

CloudWatch Metrics over time for monitoring
CloudWatch Logs for storing application log
CloudWatch Alarms to send notifications in case of unexpected metrics

Automated Trace Analysis & Central Service Map Visualization
Latency, Errors and Fault analysis
Request tracking across distributed systems

ServiceLens

ServiceLens can be used to enhance application health monitoring by integrating traces, metrics, logs, alarms, and other resource health information into one place.
ServiceLens integrates with X-Ray to provide an end-to-end view of the application.

ServiceLens can be deployed with the following 2 steps:

Deploy X-Ray
Deploy the CloudWatch agent and the X-Ray daemon

AWS Status - Service Health Dashboard

Shows all regions, all services current status information on service availability
Shows historical information for each day
Has an RSS feed you can subscribe to

Note

Not personalized information so may not be relevant to you No proactive notification of scheduled activities

AWS Personal Health Dashboard

AWS Personal Health Dashboard provides alerts and remediation guidance when AWS is experiencing events that may impact you.
While the Service Health Dashboard displays the general status of AWS services, Personal Health Dashboard gives you a personalized view into the performance and availability of the AWS services underlying your AWS resources.
Also provides proactive notification to help you plan for scheduled activities
The dashboard displays relevant and timely information to help you manage events in progress and provides proactive notification to help you plan for scheduled activities.

Amazon Managed Service for Prometheus

Prometheus is an open-source monitoring system and time series database

Use the open-source Prometheus query language (PromQL) to monitor and alert on the performance of containerized workloads
Automatically scales the ingestion, storage, alerting, and querying of operational metrics as workloads grow or shrink
Integrated with Amazon EKS, Amazon ECS, and AWS Distro for OpenTelemetry

Amazon Managed Grafana

Grafana is an open-source analytics and monitoring solution for databases

Highly scalable, highly available, and fully managed service Grafana
Provides interactive data visualization for your monitoring and operational data
Visualize, analyze, and alarm on your metrics, logs, and traces collected from multiple data sources
Integrates with AWS SSO and SAML

Security & Compliance

AWS Shared Responsibility Model

AWS responsibility - Security of the Cloud
- Protecting infrastructure (hardware, software, facilities, and networking) that runs all the AWS services
- Managed services like S3, DynamoDB, RDS, etc.
Customer responsibility - Security in the Cloud
- For EC2 instance, customer is responsible for management of the guest OS (including security patches and updates), firewall & network configuration, IAM
- Encrypting application data
Shared controls:
- Patch Management, Configuration Management, Awareness & Training

DDOS Protection on AWS

AWS Shield Standard: protects against DDOS attack for your website and applications, for all customers at no additional costs
AWS Shield Advanced: 24/7 premium DDoS protection
AWS WAF: Filter specific requests based on rules
CloudFront and Route 53:
- Availability protection using global edge network
- Combined with AWS Shield, provides attack mitigation at the edge

AWS WAF – Web Application Firewall

WAF

Protects your web applications from common web exploits (Layer 7)
Deploy on:
- Application Load Balancer
- API Gateway REST API
- CloudFront
- AppSync

Concepts

Web ACLs – You use a web access control list (ACL) to protect a set of AWS resources
Rules – Each rule contains a statement that defines the inspection criteria, and an action to take if a web request meets the criteria. Can be created based on conditions like HTTP headers, HTTP body, URI strings, SQL injection, and cross-site scripting
Rule groups – You can use rules individually or in reusable rule groups
IP Sets - An IP set provides a collection of IP addresses and IP address ranges that you want to use together in a rule statement
Regex Pattern Set - A regex pattern set provides a collection of regular expressions that you want to use together in a rule statement

Tip

You are charged for each web ACL configured on AWS WAF.

Define Web ACL (Web Access Control List)

Rules can include IP addresses, HTTP headers, HTTP body, or URI strings
Protects from common attack - SQL injection and Cross-Site Scripting (XSS)
Size constraints, geo-match (block countries)
Rate-based rules (to count occurrences of events) – for DDoS protection

Tip

A web ACL can allow or deny traffic based on the source IP address, country of origin of the request, string match or regular expression (regex) match, or the detection of malicious SQL code or scripting.
You can also use the logs that are generated to examine the number of requests, the nature of those requests, and where they originate from.

Rule Action

A rule action tells AWS WAF what to do with a web request when it matches the criteria defined in the rule

Count – AWS WAF counts the request but doesn't determine whether to allow it or block it. With this action, AWS WAF continues processing the remaining rules in the web ACL
Allow – AWS WAF allows the request to be forwarded to the AWS resource for processing and response
Block – AWS WAF blocks the request and the AWS resource responds with an HTTP 403 (Forbidden) status code

Match Statement

Match statements compare the web request or its origin against conditions that you provide

Match Statement	Description
Geographic match	Inspects the request's country of origin
IP set match	Inspects the request against a set of IP addresses or address ranges
Regex pattern set match	Inspects the request against a set of regular expressions
Size constraint	Inspects the request for length, width, or depth
SQL injection match	Inspects the request for SQL code that is intended to execute malicious commands
String match	Compares a string to a specified request component, such as the URI or query string
XSS match	Inspects the request for cross-site scripting attacks

AWS Shield

AWS Shield is a managed Distributed Denial of Service (DDoS) protection service
Safeguards web application running on AWS with always-on detection and automatic inline mitigations
Helps to minimize application downtime and latency
Two tiers
- Standard
  - Automatically enabled free of charge for all AWS customers
  - Defends against frequently occurring network and transport layer DDoS attacks
  - Incoming traffic is inspected for malicious patterns in real time
- Advanced
  - Provides mitigations against large and sophisticated DDoS attacks and near real-time visibility into attacks
  - AWS Shield Advanced also provides integration with AWS Web Application Firewall
  - Protects resources on EC2, Elastic Load Balancing, CloudFront, AWS Global Accelerator, and Route 53
Integrated with Amazon CloudFront (standard included by default)

Why encryption?

Encryption in flight (SSL)Server side encryption at restClient side encryption

Data is encrypted before sending and decrypted after receiving
SSL certificates help with encryption (HTTPS)
Encryption in flight ensures no MITM (man in the middle attack) can happen

Data is encrypted after being received by the server
Data is decrypted before being sent
It is stored in an encrypted form thanks to a key (usually a data key)
The encryption / decryption keys must be managed somewhere and the server must have access to it

Data is encrypted by the client and never decrypted by the server
Data will be decrypted by a receiving client
The server should not be able to decrypt the data
Could leverage Envelope Encryption

AWS KMS (Key Management Service)

KMS = AWS manages the encryption keys for us

Fully integrated with IAM for authorization
Create and managed symmetric and asymmetric encryption keys
The KMS keys are protected by hardware security modules (HSMs)
Easy way to control access to your data
Able to audit KMS Key usage using CloudTrail
Seamlessly integrated into most AWS services (EBS, S3, RDS, SSM…)

Danger

Never ever store your secrets in plaintext, especially in your code!

KMS Key Encryption also available through API calls (SDK, CLI) Encrypted secrets can be stored in the code / environment variables

Encryption Opt-in:
- EBS volumes: encrypt volumes
- S3 buckets: Server-side encryption of objects
- Redshift database: encryption of data
- RDS database: encryption of data
- EFS drives: encryption of data
Encryption Automatically enabled:
- CloudTrail Logs
- S3 Glacier
- Storage Gateway

KMS Key Types

KMS Keys is the new name of KMS Customer Master Key

Symmetric (AES-256 keys)Asymmetric (RSA & ECC key pairs)

Single encryption key that is used to Encrypt and Decrypt
AWS services that are integrated with KMS use Symmetric CMKs
You never get access to the KMS Key unencrypted (must call KMS API to use)

Public (Encrypt) and Private Key (Decrypt) pair
Used for Encrypt/Decrypt, or Sign/Verify operations
The public key is downloadable, but you can’t access the Private Key unencrypted
Use case: encryption outside of AWS by users who can’t call the KMS API

Key Types

Three types of KMS Keys:
- AWS Managed Keys: free (aws/service-name, example: aws/rds or aws/ebs)
- Customer Managed Keys (CMK) created in KMS: $1 / month
- Customer Managed Keys imported (must be 256-bit symmetric key): $1 / month
+ pay for API call to KMS ($0.03 / 10000 calls)

Tip

If you choose to import keys to AWS KMS or asymmetric keys or use a custom key store, you can manually rotate them by creating a new KMS key and mapping an existing key alias from the old KMS key to the new KMS key.

Info

AWS Managed KMS Keys

Created, managed, and used on your behalf by an AWS service that is integrated with AWS KMS
You cannot manage these KMS keys, rotate them, or change their key policies
You also cannot use AWS managed KMS keys in cryptographic operations directly; the service that creates them uses them on your behalf

Automatic Key Rotation

AWS-managed KMS Key: automatic every 1 year
Customer-managed KMS Key: (must be enabled) automatic every 1 year
Imported KMS Key: only manual rotation possible using alias

Note

Rotation only changes the key material used for encryption, the KMS key remains the same

With automatic key rotation:

The properties of the KMS key, including its key ID, key ARN, region, policies, and permissions, do not change when the key is rotated
You do not need to change applications or aliases that refer to the key ID or key ARN of the KMS key
After you enable key rotation, AWS KMS rotates the KMS key automatically every year

Automatic key rotation is not supported on the following types of KMS keys:

Asymmetric KMS keys
HMAC KMS keys
KMS keys in custom key stores
KMS keys with imported key material

Info

You can rotate these KMS keys manually

Manual Rotation

Manual rotation is creating a new KMS key with a different key ID
You must then update your applications with the new key ID
You can use an alias to represent a KMS key so you don’t need to modify your application code

Alternative Key Stores

External Key Store

Keys can be stored outside of AWS to meet regulatory requirements
You can create a KMS key in an AWS KMS external key store (XKS)
All keys are generated and stored in an external key manager
When using an XKS, key material never leaves your HSM

Custom Key Store

You can create KMS keys in an AWS CloudHSM custom key store
All keys are generated and stored in an AWS CloudHSM cluster that you own and manage
Cryptographic operations are performed solely in the AWS CloudHSM cluster you own and manage
Custom key stores are not available for asymmetric KMS keys

KMS Key Policies

Key policies define management and usage permissions for KMS keys( "similar" to S3 bucket policies)
Difference: you cannot control access without them
Multiple policy statements can be combined to specify separate administrative and usage permissions
Permissions can be specified for delegating use of the key to AWS services
Default KMS Key Policy:
- Created if you don’t provide a specific KMS Key Policy
- Complete access to the key to the root user = entire AWS account
Custom KMS Key Policy:
- Define users, roles that can access the KMS key
- Define who can administer the key
- Useful for cross-account access of your KMS key

Tip

To share snapshots with another account you must specify Decrypt and CreateGrant permissions
The kms:ViaService condition key can be used to limit key usage to specific AWS services
Cryptographic erasure means removing the ability to decrypt data and can be achieved when using imported key material and deleting that key material
You must use the DeletelmportedKeyMaterial API to remove the key material
AnInvalidKeyId exception when using SSM Parameter Store indicates the KMS key is not enabled

Envelope Encryption

KMS Encrypt API call has a limit of 4 KB
If you want to encrypt >4 KB, we need to use Envelope Encryption
The main API that will help us is the GenerateDataKey API

API & CLI

Encrypt (aws kms encrypt):

Encrypts plaintext into ciphertext by using a KMS key
You can encrypt small amounts of arbitrary data, such as a personal identifier or database password, or other sensitive information
You can use the Encrypt operation to move encrypted data from one AWS region to another

Decrypt (aws kms decrypt):

Decrypts ciphertext that was encrypted by an KMS key using any of the following operations:
Encrypt
GenerateDataKey
GenerateDataKeyPair
GenerateDataKeyWithoutPlaintext
GenerateDataKeyPairWithoutPlaintext

Re-encrypt (aws kms re-encrypt):

Decrypts ciphertext and then re-encrypts it entirely within AWS KMS
You can use this operation to change the KMS key under which data is encrypted, such as when you manually rotate a KMS key or change the KMS key that protects a ciphertext
You can also use it to re-encrypt ciphertext under the same KMS key, such as to change the encryption context of a ciphertext

Enable-key-rotation:

Enables automatic rotation of the key material for the specified symmetric KMS key
You cannot perform this operation on a KMS key in a different AWS account

GenerateDataKey (aws kms generate-data-key):

Generates a unique symmetric data key
This operation returns a plaintext copy of the data key and a copy that is encrypted under a KMS key that you specify
You can use the plaintext key to encrypt your data outside of AWS KMS and store the encrypted data key with the encrypted data

GenerateDataKeyWithoutPlaintext (generate-data-key-without-plaintext):

Generates a unique symmetric data key
This operation returns a data key that is encrypted under a KMS key that you specify
To request an asymmetric data key pair, use the GenerateDataKeyPair or GenerateDataKeyPairWithoutPlaintext operations

Throttling and Caching

AWS KMS has two types of quotas:
- Resourcequotas
- Request quotas
If you exceed a resource limit, requests to create an additional resource of that type generate an LimitExceededException error message
Request quotas apply to API actions such as Encrypt, Decrypt, ReEncrypt, and GenerateDataKey
To prevent throttling, you can:
- Implement a backoff and retry strategy
- Request a service quota increase
- Implement data key caching
Data key caching stores data keys and related cryptographic material in a cache
Useful if your application:
- Can reuse data keys
- Generates numerous data keys
- Runs cryptographic operations that are unacceptably slow, expensive, limited, or resource-intensive
You can create a local cache using the AWS Encryption SDK and the LocalCryptoMaterialsCache feature

Encryption SDK

The AWS Encryption SDK implemented Envelope Encryption for us
The Encryption SDK also exists as a CLI tool we can install
Implementations for Java, Python, C, JavaScript
Feature - Data Key Caching:
- re-use data keys instead of creating new ones for each encryption
- Helps with reducing the number of calls to KMS with a security trade-off
- Use LocalCryptoMaterialsCache (max age, max bytes, max number of messages)

CloudHSM

KMS => AWS manages the software for encryption
CloudHSM => AWS provisions encryption hardware

Dedicated Hardware (HSM = Hardware Security Module)
You manage your own encryption keys entirely (not AWS)
HSM device is tamper resistant, FIPS 140-2 Level 3 compliance
CloudHSM runs in your Amazon VPC
Uses FIPS 140-2 level 3 validated HSMs
Managed service and automatically scales
Retain control of your encryption keys - you control access (and AWS has no visibility of your encryption keys)
Use Cases:
- Offload SSL/TLS processing from web servers
- Protect private keys for an issuing certificate authority (CA)
- Store the master key for Oracle DB Transparent Data Encryption
- Custom key store for AWS KMS – retain control of the HSM that protects the master keys

AWS Certificate Manager (ACM)

Easily provision, manage, and deploy SSL/TLS X.509 Certificates
Used to provide in-flight encryption for websites (HTTPS)
Supports both public and privateTLS certificates
Free of charge for publicTLS certificates
AutomaticTLS certificate renewal
Single domains, multiple domain names and wildcards
Integrates with several AWS services including:
- Elastic Load Balancers
- Amazon CloudFront Distributions
- AWS API Gateway
- AWS Elastic Beanstalk
- AWS Nitro Enclaves
- AWS CloudFormation
- AWS App Runner
Public certificates are signed by the AWS public Certificate Authority
You can also create a Private CA with ACM
Can then issue private certificates
You can also import certificates from third-party issuers

SSM Parameter Store

Parameter Store provides secure, hierarchical storage for configuration data and secrets
Optional Seamless Encryption using KMS
Serverless, scalable, durable, easy SDK
Version tracking of configurations / secrets
Security through IAM
Notifications with Amazon EventBridge
Integration with CloudFormation
Store data such as passwords, database strings, and license codes as parameter values
Store values as plaintext (unencrypted data) or ciphertext (encrypted data)
Reference values by using the unique name that you specified when you created the parameter
No native rotation of keys (difference with AWS Secrets Manager which does it automatically)

AWS Secrets Manager

Stores and rotate secrets safely without the need for code deployments
Capability to force rotation of secrets every X days
Automate generation of secrets on rotation (uses Lambda)
Built-in Integration with:
- Amazon RDS (MySQL, PostgreSQL, Aurora)
- Amazon Redshift
- Amazon DocumentDB
Secrets are encrypted using KMS

Tip

AWS Secrets Manager can be configured to replicate secrets across multiple regions. With this feature, multi-region applications like Amazon RDS can retrieve secrets locally within the region instead of retrieving secrets across the regions.

Following are features when AWS Secrets Manager is configured to replicate secrets in multiple regions:

Replicated Secrets will have a common name across all regions.
Secrets Manager replicates all encrypted secrets and metadata.

SSM Parameter Store vs Secrets Manager

Secrets Manager ($$$)SSM Parameter Store ($)

Automatic rotation of secrets with AWS Lambda
Lambda function is provided for RDS, Redshift, DocumentDB
KMS encryption is mandatory
Can integrate with CloudFormation

Simple API
No secret rotation (can enable rotation using Lambda triggered by CW Events)
KMS encryption is optional
Can integration with CloudFormation
Can pull a Secrets Manager secret using the SSM Parameter Store API

Amazon Inspector

Runs assessments that check for security exposures and vulnerabilities in EC2 instances
For EC2 instances
- Leveraging the AWS System Manager (SSM) agent
- Analyze against unintended network accessibility
- Analyze the running OS against known vulnerabilities
For Containers push to Amazon ECR
- Assessment of containers as they are pushed
Reporting & integration with AWS Security Hub
Send findings to Amazon Event Bridge
Only evaluate for EC2 instances and container infrastructure
Can be configured to run on a schedule
Agent must be installed on EC2 for host assessments
Network assessments do not require an agent

Tip

An assessment template for Amazon Inspector consists of the following parameters:

Name
Target Name
Rules Package
Duration

Optional Parameters are:

SNS Topics
Tag
Attributes added to findings
Assessment Schedule

Network Assessments

Assessments: Network configuration analysis to check for ports reachable from outside the VPC
If the Inspector Agent is installed on your EC2 instances, the assessment also finds processes reachable on port
Price based on the number of instance assessments

Host Assessments

Assessments: Vulnerable software (CVE), host hardening (CISbenchmarks), and security best practices
Requires an agent (auto-install with SSM Run Command)
Price based on the number of instance assessments

AWS GuardDuty

Intelligent threat detection service
Not turned on by default
Detects account compromise, instance compromise, malicious reconnaissance, and bucket compromise
Continuous monitoring for events across:
- AWS CloudTrail Management Events
- AWS CloudTrail S3 Data Events
- Amazon VPC Flow Logs
- DNS Logs
For an automated preventive action based upon Amazon GuardDuty findings, Amazon CloudWatch Events and Amazon Lambda function can be used.
Amazon GuardDuty findings are sent to Amazon CloudWatch events based upon match criteria, triggers an AWS Lambda function which will modify VPC NACLs based upon Amazon GuardDuty findings.

Note

Key features:

Account-level threat detection to determine whether AWS accounts may have been compromised
The ability to create automated threat response actions
Monitoring of potential reconnaissance attempts
Monitoring of possible EC2 instance compromises
Monitoring of possible S3 bucket compromises

Tip

Amazon GuardDuty creates an event for Amazon CloudWatch Events when any change in findings occurs. For each of the findings, a finding ID is assigned. For newly generated findings with unique finding ID, notifications are sent within 5 minutes of the findings. But all subsequent findings are aggregated to a single event, and notifications are sent every 6 hours.

AWS Config

Evaluate your AWS resource configurations for desired settings
Get a snapshot of the current configurations of resources that are associated with your AWS account
Retrieve configurations of resources that exist in your account
Retrieve historical configurations of one or more resources
Receive a notification whenever a resource is created, modified, or deleted
View relationships between resources
Possibility of storing the configuration data into S3 (analyzed by Athena)

Tip

Questions that can be solved by AWS Config:
- Is there unrestricted SSH access to my security groups?
- Do my buckets have any public access?
- How has my ALB configuration changed over time?
You can receive alerts (SNS notifications) for any changes(whenever a resource is created, modified, or deleted)
AWS Config is a per-region service
Can be aggregated across regions and accounts
AWS Config Dashboard provides an overview of resources, config rules and compliance status of both resources and rules. The dashboard displays details of the resources and rules only specific to a region and an account.

Example

Example Rules:

cloudtrail-enabled: Checks whether CloudTrail is enabled in your AWS account
s3-bucket-versioning-enabled: Checks whether versioning is enabled for your S3 buckets
restricted-ssh: Checks whether security groups allow unrestricted SSH access
rds-instance-public-access-check: Checks whether RDS instances are publicly accessible
cloudtrail-enabled: Checks whether CloudTrail is enabled in your AWS account, If CloudTrail is turned off, this rule automatically re-enables it by using automatic remediation.
approved-amis-by-id: Checks whether the AMIs used by your EC2 instances are approved

Aggregators

To aggregate compliance data from multiple accounts & multiple regions into a single account, Multi-Account Multi-Region Data Aggregation can be used.
For this, an Aggregator needs to be created in a region where aggregated AWS Config configuration and compliance data are required.
Aggregator collects compliance data from multiple source accounts and from multiple regions. Source Accounts that are part of AWS Organizations do not require to provide any authorization.
Source accounts that are not part of AWS Organizations require authorization which permits Aggregator to collect AWS Config configuration and compliance data.
An aggregator dashboard can be used to view the total number of resources and the non-compliance resources for multiple accounts and regions

AWS Resource Access Manager (RAM)

Shares resources:
- Across AWS accounts
- Within AWS Organizations or OUs
- IAM roles and IAM users
Resource shares are created with:
- The AWS RAM Console
- AWS RAM APIs
- AWS CLI
- AWS SDKs
RAM can be used to share with:
- AWS App Mesh
- Amazon Aurora
- AWS Certificate Manager Private Certificate Authority
- AWS CodeBuild
- Amazon EC2
- EC2 Image Builder
- AWS Glue
- AWS License Manager
- AWS Network Firewall
- AWS Outposts
- Amazon S3 on Outposts
- AWS Resource Groups
- Amazon Route 53
- AWS Systems Manager Incident Manager
- Amazon VPC

Amazon Macie

Macie is a fully managed data security and data privacy service

Uses machine learning and pattern matching to discover, monitor, and help you protect your sensitive data on Amazon S3 by creating and running sensitive data discovery jobs
Macie enables security compliance and preventive security as follows:
- Identify a variety of data types, including PII, Protected Health Information (PHI), regulatory documents, API keys, and secret keys
- Identify changes to policy and access control lists
- Continuously monitor the security posture of Amazon S3
- Generate security findings that you can view using the Macie console, AWS Security Hub, or Amazon EventBridge
- Manage multiple AWS accounts using AWS Organizations

Amazon Detective

Amazon Detective analyzes, investigates, and quickly identifies the root cause of security issues or suspicious activities (using ML and graphs)

Automatically collects and processes events from VPC Flow Logs, CloudTrail, GuardDuty and create a unified view
Produces visualizations with details and context to get to the root cause

AWS Abuse

Report suspected AWS resources used for abusive or illegal purposes
Abusive & prohibited behaviors are:
- Spam – receving undesired emails from AWS-owned IP address, websites & forums spammed by AWS resources
- Port scanning – sending packets to your ports to discover the unsecured ones
- DoS or DDoS attacks – AWS-owned IP addresses attempting to overwhlem or crash your servers/softwares
- Intrusion attempts – logging in on your resources
- Hosting objectionable or copyrighted content – distributing illegal or copyrighted content without consent
- Distributing malware – AWS resources distributing softwares to harm computers or machines

Root user privileges

Root user = Account Owner (created when the account is created)

Has complete access to all AWS services and resources
Lock away your AWS account root user access keys!
Do not use the root account for everyday tasks, even administrative tasks
Actions that can be performed only by the root user:
- Change account settings(account name, email address, rootuser password, rootuser accesskeys)
- View certain tax invoices
- Close your AWS account
- Restore IAM user permissions
- Change or cancel your AWS Support plan
- Register as a seller in the Reserved Instance Marketplace
- Configure an Amazon S3 bucket to enable MFA
- Editor delete an Amazon S3 bucket policy that includes an invalid VPC ID or VPC endpoint ID
- Sign up for GovCloud

AWS Directory Services

AWS Managed Microsoft AD

Fully managed AWS service
Best choice if you have more than 5000 users and/or need a trust relationship set up
Can perform schema extensions
Can setup trust relationships to with on-premises Active Directories:
- On-premise users and groups can access resources in either domain using SSO
- Requires a VPN or Direct Connect connection
Can be used as a standalone AD in the AWS cloud

AD Connector

Redirects directory requests to your on-premise Active Directory
Best choice when you want to use an existing Active Directory with AWS services
AD Connector comes in two sizes:
- Small – designed for organizations up to 500 users
- Large – designed for organizations up to 5000 users
Requires a VPN or Direct Connect connection
Join EC2 instances to your on-premise AD through AD Connector
Login to the AWS Management Console using your on-premise AD DCs for authentication

Simple AD

Inexpensive Active Directory-compatible service with common directory features
Standalone, fully managed, directory on the AWS cloud
Simple AD is generally the least expensive option
Best choice for less than 5000 users and don’t need advanced AD features
Features include:
- Manage user accounts / groups
- Apply group policies
- Kerberos-based SSO
- Supports joining Linux or Windows based EC2 instances

AWS License Manager

Used to manage licenses from software vendors
For example, manage your Microsoft, Oracle, SAP and IBM licenses
Centralized management of software licenses for AWS and on-premises resources
Can track license usage including when licensed based on virtual cores (vCPUs), sockets, or number of machines
Distribute, activate, and track software licenses across accounts and throughout an organization
Enforce limits across multiple Regions

AWS Trusted Advisor

AWS Trusted Advisor is a tool that performs checks on five different categories:

Cost optimization
Performance
Security
Fault tolerance
Service limits

Trusted Advisor provides real time guidance to help you provision your resources following best practices

Tip

The quantity of available Trusted Advisor checks is affected by the AWS Support plan.
The AWS Support plan determines which Trusted Advisor checks are available for an account.
Basic Support provides access to 7 core Trusted Advisor checks, Developer Support provides access to 29 core Trusted Advisor checks, while Business and Enterprise Support provide access to all Trusted Advisor checks and guidance.

AWS Security Hub

The AWS Security Hub allows you to execute security checks across your AWS environment automatically.

Provides cloud security posture management
Run security best practice configuration checks such as
- AWS Foundational Security Best Practices
- Center for Internet Security (CIS)
- Payment Card Industry (PCI DSS)
Aggregates alerts and provides automated remediation
Integrations include:
- Amazon EventBridge – send alerts and trigger targets
- Systems Manager – automation via runbooks
- AWS Organizations – consolidated findings across accounts
- Ticketing, chat, incident management, investigation, GRC, SOAR, and SIEM tools

It also allows you to gather alerts from the following security policies into a central view:

Amazon GuardDuty
Amazon Inspector
IAM Access Analyzer
Amazon Macie
IAM Firewall Manager
Amazon System Manager

AWS Systems Manager

AWS Systems Manager is a secure end-to-end management solution for resources on AWS, on premises, and on other clouds

Systems Manager is an operations hub for operations management, application management, change management and node management
You can manage Amazon EC2 instances and on-premises servers and virtual machines (VMs), including VMs hosted by other cloud providers
The SSM Agent must be installed on managed instances and servers
IAM permissions must be configured to allow management
For hybrid environments an IAM service role is used with hybrid activation
Systems Manager components include:
- Automation
- Run Command
- Inventory
- Patch Manager
- Compliance
- Session Manager
- Parameter Store

Note

By default, AWS Systems Manager doesn't have permission to perform actions on your instances. Grant access by using an AWS Identity and Access Management (IAM) instance profile. An instance profile is a container that passes IAM role information to an Amazon Elastic Compute Cloud (Amazon EC2) instance at launch. You can create an instance profile for Systems Manager by attaching one or more IAM policies that define the necessary permissions to a new role or to a role you already created.

Systems Manager Patch Manager

Helps you select and deploy operating system and software patches automatically across large groups of Amazon EC2 or on-premises instances
Patch baselines:
- Set rules to auto-approve select categories of patches to be installed
- Specify a list of patches that override these rules and are automatically approved or rejected
You can also schedule maintenance windows for your patches so that they are only applied during predefined times
Systems Manager helps ensure that your software is up-to-date and meets your compliance policies

Systems Manager Compliance

AWS Systems Manager lets you scan your managed instances for patch compliance and configuration inconsistencies
You can collect and aggregate data from multiple AWS accounts and Regions, and then drill down into specific resources that aren’t compliant
By default, AWS Systems Manager displays data about patching and associations
You can also customize the service and create your own compliance types based on your requirements (must use the AWS CLI, AWS Tools for Windows PowerShell, or the SDKs)

Systems Manager Session Manager

Secure remote management of your instances at scale without logging into your servers
Replaces the need for bastion hosts, SSH, or remote PowerShell
Integrates with IAM for granular permissions
All actions taken with Systems Manager are recorded by AWS CloudTrail
Can store session logs in an S3 bucket and output to CloudWatch Logs
Requires IAM permissions for EC2 instance to access SSM, S3, and CloudWatch Logs

Tip

Doesn't require open ports: 22, 5985, 5986
No need for bastion hosts

Systems Manager Parameter Store

Parameter Store provides secure, hierarchical storage for configuration data management and secrets management
Highly scalable, available, and durable
Store data such as passwords, database strings, and license codes as parameter values
Store values as plaintext (unencrypted data) or ciphertext (encrypted data)
Reference values by using the unique name that you specified when you created the parameter
No native rotation of keys (difference with AWS Secrets Manager which does it automatically)

Machine Learning

Amazon Rekognition

Find objects, people, text, scenes in images and videos using ML
Facial analysis and facial search to do user verification, people counting
Create a database of "familiar faces" or compare against celebrities
Processes videos stored in an Amazon S3 bucket
Publish completion status to Amazon SNS Topic
Use cases:
- Labeling
- Content Moderation
- Text Detection
- Face Detection and Analysis (gender, age range, emotions...)
- Face Search and Verification
- Celebrity Recognition
- Pathing

Amazon Transcribe

Automatically convert speech to text
Uses a deep learning process called automatic speech recognition (ASR) to convert speech to text quickly and accurately
Automatically remove Personally Identifiable Information (PII) using Redaction
Supports Automatic Language Identification for multi-lingual audio
Use cases:
- transcribe customer service calls
- automate closed captioning and subtitling
- generate metadata for media assets to create a fully searchable archive

Amazon Polly

Turns text into lifelike speech
Create applications that talk, and build entirely new categories of speech-enabled products
Text-to-Speech (TTS) service uses advanced deep learning technologies to synthesize natural sounding human speech

Amazon Translate

Neural machine translation service that delivers fast, high-quality, and affordable language translation
Uses deep learning models to deliver more accurate and more natural sounding translation
Localize content such as websites and applications for your diverse users

Amazon Lex & Connect

Amazon Lex: (same technology that powers Alexa)
- Automatic Speech Recognition (ASR) to convert speech to text
- Natural Language Understanding to recognize the intent of text, callers
- Helps build chatbots, call center bots
Amazon Connect:
- Receive calls, create contact flows, cloud-based virtual contact center
- Can integrate with other CRM systems or AWS
- No upfront payments, 80% cheaper than traditional contact center solutions

Amazon Comprehend

For Natural Language Processing(NLP)
Fully managed and serverless service
Uses machine learning to find insights and relationships in text
- Language of the text
- Extracts key phrases, places, people, brands, or events
- Understands how positive or negative the text is
- Analyzes text using tokenization and parts of speech
- Automatically organizes a collection of text files by topic
Sample use cases:
- analyze customer interactions (emails) to find what leads to a positive or negative experience
- Create and groups articles by topics that Comprehend will uncover

Amazon Comprehend Medical

Amazon Comprehend Medical detects and returns useful information in unstructured clinical text:
- Physician’s notes
- Discharge summaries
- Test results
- Case notes
Uses NLP to detect Protected Health Information (PHI) – DetectPHI API
Store your documents in Amazon S3, analyze real-time data with Kinesis Data Firehose, or use Amazon Transcribe to transcribe patient narratives into text that can be analyzed by Amazon Comprehend Medical.

Amazon SageMaker

Fully managed service for data scientists and developers to prepare, build, train, and deploy high-quality machine learning models
Typically, difficult to do all the processes in one place + provision servers
Machine learning process (simplified): predicting your exam score
ML development activities including:
- Data preparation
- Feature engineering
- Statistical bias detection
- Auto-ML
- Training and tuning
- Hosting
- Monitoring
- Workflows

Amazon Forecast

Fully managed service that uses ML to deliver highly accurate forecasts
Example: predict the future sales of a raincoat
50% more accurate than looking at the data itself
Reduce forecasting time from months to hours
Use cases: Product Demand Planning, Financial Planning, Resource Planning, ...

Amazon Kendra

Fully managed document search service powered by Machine Learning
Extract answers from within a document (text, pdf, HTML, PowerPoint, MS Word, FAQs...)
Natural language search capabilities
Learn from user interactions/feedback to promote preferred results (Incremental Learning)
Ability to manually fine-tune search results (importance of data, freshness, custom, ...)

Amazon Personalize

Fully managed ML-service to build apps with real-time personalized recommendations
Same technology used by Amazon.com
Integrates into existing websites, applications, SMS, email marketing systems, ...
Implement in days, not months (you don’t need to build, train, and deploy ML solutions)
Use cases: retail stores, media and entertainment...

AmazonTextract

Automatically extracts text, handwriting, and data from any scanned documents using AI and ML
Extract data from forms and tables
Read and process any type of document (PDFs, images, ...)
Features:
- Optical character recognition (OCR)
- Identifies relationships, structure, and text
- Uses AI to extract text and structured data
- Recognizes handwriting as well as printed text
- Can extract from documents such as PDFs, images, forms, and tables
- Understands context. For example know what data to extract from a receipt or invoice
Use cases:
- Financial Services (e.g., invoices, financial reports)
- Healthcare (e.g., medical records, insurance claims)
- Public Sector (e.g., tax forms, ID documents, passports)

Amazon Elastic Transcoder

Transcodes video files to various formats / outputs
AWS Elemental MediaConvert is a new service that provides more functionality and may be better for new use cases
Some features are only available on Elastic Transcoder such as:
- WebM (VP8/VP9) input and output
- Animated GIF output
- MP3, FLAC, Vorbis, and WAV audio-only output
However, MediaConvert offers more options overall

Summary

Rekognition: face detection, labeling, celebrity recognition
Transcribe: audio to text (ex: subtitles)
Polly: text to audio
Translate: translations
Lex: build conversational bots – chatbots
Connect: cloud contact center
Comprehend: natural language processing
SageMaker: machine learning for every developer and data scientist
Forecast: build highly accurate forecasts
Kendra: ML-powered search engine
Personalize: real-time personalized recommendations
Textract: detect text and data in documents

Account Management, Billing & Support

AWS Organizations

A management service that enables you to consolidate multiple AWS accounts into an organization that you create and centrally manage

AWS Organizations

Global service
Available in two feature sets:
- Consolidated Billing
- All features
The main account is the management account
Other accounts are member accounts
- Member accounts can only be part of one organization
Cost Benefits:
- Consolidated Billing across all accounts - single payment method
- Pricing benefits from aggregated usage (volume discount for EC2, S3...)
- Pooling of Reserved EC2 instances for optimal savings
API is available to automate AWS account creation
Restrict account privileges using Service Control Policies (SCP)
Includes root accounts and organizational units
Policies are applied to root accounts or OU(Organization Unit)s
Consolidated billing includes:
- Paying Account – independent and cannot access resources of other accounts
- Linked Accounts – all linked accounts are independent

Organizations API

The AWS Organizations API can be used to automate organization and account creation

Useful API actions:

CreateOrganization – creates an organization
CreateAccount – creates an account that is a member of the organization
CreatePolicy – creates a policy that can be attached to a root, OU, or individual AWS account
AttachPolicy - Attaches a policy to a root, an organizational unit (OU), or an individual account
InviteAccountToOrganization - Sends an invitation to another account to join your organization as a member account

Service Control Policies (SCP)

Whitelist or blacklist IAM actions
Applied at the OU(Organization Unit) or Account level
Does not apply to the Master Account
SCP is applied to all the Users and Roles of the Account, including Root user
The SCP does not affect service-linked roles
Service-linked roles enable other AWS services to integrate with AWS Organizations and can't be restricted by SCPs.
SCP must have an explicit Allow (does not allow anything by default)
Use cases:
- Restrict access to certain services (for example: can’t use EMR)
- Enforce PCI compliance by explicitly disabling services

SCP Strategies and Inheritance

Deny List Strategy:

The FullAWSAccess SCP is attached to every OU and account
Explicitly allows all permissions to flow down from the root
Can explicitly override with a deny in an SCP
This is the default setup

Note

An explicit deny overrides any kind of allow

Allow List Strategy:

The FullAWSAccess SCP is removed from every OU and account
To allow a permission, SCPs with allow statements must be added to the account and every OU above it including root
Every SCP in the hierarchy must explicitly allow the APIs you want to use

AWS Control Tower

Easy way to set up and govern a secure and compliant multi-account AWS environment based on best practices -> known as landing zone
Benefits:
- Automate the set up of your environment in a few clicks
- Automate ongoing policy management using guardrails
- Detect policy violations and remediate them
- Monitor compliance through an interactive dashboard
AWS Control Tower runs on top of AWS Organizations:
- It automatically sets up AWS Organizations to organize accounts and implement SCPs (Service Control Policies)
Guardrails are used for governance and compliance:
- Preventive guardrails are based on SCPs and disallow API actions
- Detective guardrails are implemented using AWS Config rules and Lambda functions and monitor and govern compliance
The root user in the management account can perform actions that guardrails would disallow

AWS Organizations vs Control Tower

AWS OrganizationsAWS Control Tower

Manage multiple accounts
Consolidated billing
Organizational Units
Service Control Policies
Tag Policies
Backup Policies

Extends capabilities of AWS Organizations
Landing Zones (best practices)
Federated Access (IAM Identity Center)
Centralized Logging
Account Factory (automation)
Guardrails (governance rules)

Pricing Models in AWS

AWS has 4 pricing models:
- Pay as you go: pay for what you use, remain agile, responsive, meet scale demands
- Save when you reserve: minimize risks, predictably manage budgets, comply with long-terms requirements
  - Reservations are available for EC2 Reserved Instances, DynamoDB Reserved Capacity, ElastiCache Reserved Nodes, RDS Reserved Instance, Redshift Reserved Nodes
- Pay less by using more: volume-based discounts
- Pay less as AWS grows

Billing and Costing Tools

Estimating costs in the cloud:
- Pricing Calculator
Tracking costs in the cloud:
- Billing Dashboard
- Cost Allocation Tags
- Cost and Usage Reports
- Cost Explorer
Monitoring against costs plans:
- Billing Alarms
- Budgets

AWS Compute Optimizer

Uses machine learning to analyze historical utilization metrics
Recommends optimal AWS resources for your workloads to reduce costs and improve performance
Offers optimization guidance for:
- Amazon EC2 instances
- Amazon EBS volumes
- AWS Lambda functions
Results can be viewed in the console or via the CLI

AWS Cost Explorer

The AWS Cost Explorer is a free tool that allows you to view charts of your costs
You can view cost data for the past 13 months and forecast how much you are likely to spend over the next three months
Cost Explorer can be used to discover patterns in how much you spend on AWS resources over time and to identify cost problem areas
Cost Explorer can help you to identify service usage statistics such as:
- Which services you use the most
- View metrics for which AZ has the most traffic
- Which linked account is used the most

AWS Cost & Usage Report

Publish AWS billing reports to an Amazon S3 bucket
Reports break down costs by:
- Hour, day, month, product, product resource, tags
Can update the report up to three times a day
Create, retrieve, and delete your reports using the AWS CUR API Reference

AWS Price List API

Query the prices of AWS services
Price List Service API (AKA the Query API) – query with JSON
AWS Price List API (AKA the Bulk API) – query with HTML
Alerts via Amazon SNS when prices change

Disaster Recovery

Backup
- EBS Snapshots, RDS automated backups / Snapshots, etc...
- Regular pushes to S3 / S3 IA / Glacier, Lifecycle Policy, Cross Region Replication
- From On-Premise: Snowball or Storage Gateway
High Availability
- Use Route53 to migrate DNS over from Region to Region
- RDS Multi-AZ, ElastiCache Multi-AZ, EFS, S3
- Site to Site VPN as a recover y from Direct Connect
Replication
- RDS Replication (Cross Region), AWS Aurora + Global Databases
- Database replication from on-premises to RDS
- Storage Gateway
Automation
- CloudFormation / Elastic Beanstalk to re-create a whole new environment
- Recover / Reboot EC2 instances with CloudWatch if alarms fail
- AWS Lambda functions for customized automations
Chaos
- Netflix has a "simian-army" randomly terminating EC2

Backup Strategies

There are four general ways to set up the backup of your environment that will also support full disaster recovery:

Backup and restorePilot lightWarm standbyMultisite active-active

RPO/RTO: Hours
Low priority workloads
Provision/restore after event
Cost $

RPO/RTO: 10s of minutes
Data replicated
Services idle/off
Resources activated after event
Cost $$

RPO/RTO: Minutes
Minimum resources always running
Business critical workloads
Scale up/out after event
Cost $$$

RPO/RTO: Real-time
Zero downtime
Near zero data loss
Mission critical workloads
Cost $$$$

RPO and RTO

Recovery Point Objective (RPO)

Measurement of the amount of data that can be acceptably lost
Measured in seconds, minutes, or hours
Example:
- You can acceptably lose 2 hours of data in a database (2hr RPO)
- This means backups must be taken every 2 hours

Recovery Time Objective (RTO)

Measurement of the amount of time it takes to restore after a disaster event
Measured in seconds, minutes, or hours
Example:
- The IT department expect it to take 4 hours to bring applications online after a disaster
- This would be an RTO of 4 hours

Achievable RPOs

Recovery Point Objective	Technique
Milliseconds - Seconds	Synchronous replication
Seconds - Minutes	Asynchronous replication
Minutes to hours	Snapshots, cloud backup, D2D
Hours to days	Offsite / traditional backups / tape backups

Tip

The RPO is determined by how you take a backup of data

Achievable RTOs

Recovery Time Objective	Technique
Milliseconds to Seconds	Fault tolerance
Seconds to Minutes	High availability, load balancing, auto scaling
Minutes to hours	Cross-site recovery (cloud), automated recovery
Hours to days	Cross-site recovery (cloud/on-premises), manual recovery

Tip

The RTO is determined by how quickly you can recover

AWS Backup

Fully managed service for automating data protection
Centralized backup management for many services:
- Compute services such as EC2
- Storage services such as EBS, S3, FSx, and EFS
- Database services such as RDS, DynamoDB, and DocumentDB
- Hybrid applications like VMware workloads on-premises or on VMware Cloud or Outposts
Policy-based data protection
Managed through the console, API, and CLI
Backups are encrypted with AWS KMS keys
Also provide protection from ransomware

Migration

DataBase Migration - DMS

DMS: Database Migration Service

Quickly and securely migrate databases to AWS, resilient, self healing
The source database remains available during the migration
Use the Schema Conversion Tool for heterogeneous migrations
Use Cases:
- Cloud to Cloud – EC2 to RDS, RDS to RDS, RDS to Aurora
- On-Premises to Cloud
- Homogeneous migrations – Oracle to Oracle, MySQL to RDS MySQL, Microsoft SQL to RDS for SQL Server
- Heterogeneous migrations – Oracle to Aurora, Oracle to PostgreSQL, Microsoft SQL to RDS MySQL (must convert schema first wit the Shema Conversion Tool (SCT))
- Development and Test – use the cloud for dev/test workloads
- Database consolidation – consolidate multiple source DBs to a single target DB
- Continuous Data Replication – use for DR, dev/test, single source multi-target or multi-source single target

Server Migration - AWS Server Migration Service (SMS)

AWS SMS migrates VMware vSphere, Microsoft Hyper-V/SCVMM, and Azure virtual machines to Amazon EC2
Automated, incremental and scheduled migrations
EC2 instances can be created from the AMIs
CloudWatch Events and Lambda can automate actions
Entire application group is launched from AMIs using CloudFormation template

Data Migration - Snow Family

Highly-secure, portable devices to collect and process data at the edge, and migrate data into and out of AWS

AWS Snowball and Snowmobile are used for migrating large volumes of data to AWS
Snowball Edge Compute Optimized
- Provides block and object storage and optional GPU
- Use for data collection, machine learning and processing, and storage in environments with intermittent connectivity (edge use cases)
Snowball Edge Storage Optimized
- Provides block storage and Amazon S3-compatible object storage
- Use for local storage and large-scale data transfer
Snowcone
- Small device used for edge computing, storage and data transfer
- Can transfer data offline or online with AWS DataSync agent

AWS OpsHub: A software installed on your computer to manage Snow Family Decives

Usage Process

Request Snowball devices from the AWS console for delivery
Install the snowball client / AWS OpsHub on your servers
Connect the snowball to your servers and copy files using the client
Ship back the device when you’re done (goes to the right AWS facility)
Data will be loaded into an S3 bucket
Snowball is completely wiped

Edge Computing

Snowcone (smaller)
- 2 CPUs, 4 GB of memory, wired or wireless access
- USB-C power using a cord or the optional battery
Snowball Edge – Compute Optimized
- 52 vCPUs, 208 GiB of RAM
- Optional GPU (useful for video processing or machine learning)
- 42 TB usable storage
Snowball Edge – Storage Optimized
- Up to 40 vCPUs,80 GiB of RAM
- Object storage clustering available
All: Can run EC2 Instances & AWS Lambda functions (using AWS IoT Greengrass)
Long-term deployment options: 1 and 3 years discounted pricing

Ways to optimize the performance of Snowball transfers

Use the latest Mac or Linux Snowball client
Batch small files together
Perform multiple copy operations at one time
Copy from multiple workstations
Transfer directories, not files

Use Cases

Cloud data migration – migrate data to the cloud
Content distribution – send data to clients or customers
Tactical Edge Computing – collect data and compute
Machine learning – run ML directly on the device
Manufacturing – data collection and analysis in the factory
Remote locations with simple data – pre-processing, tagging, compression etc

The 7 Rs of Migration

Refactor – Re-architect to a cloud-native serverless architecture
Replatform – Ex. database to RDS; server to Elastic Beanstalk
Repurchase – Use a different solution (e.g. SaaS)
Rehost – OS/App moved to another host system (lift & shift)
Relocate – Move without modification (lift & Shift)
Retain – Leave as is (revisit at a later date)
Retire – No longer needed, get rid of it

AWS Certified Solution Architect & Developer & SysOps Administrator Associate

Links

IAM

Users & Groups

Policy

Types of Policy

Determination Rules

IAM Policies Structure

Inline vs Managed Policies

IAM Security Tools

IAM Roles

AWS STS – Security Token Service

IAM Guidelines & Best Practices

EC2

Sizing & Configuration Options

EC2 User Data

EC2 Meta Data

Instance Types

ECS Instance Lifecycle

Stopping EC2 instances

Hibernating EC2 instances

Rebooting EC2 instances

Retiring EC2 instances

Terminating EC2 instances

Recovering EC2 instances

Security Groups

Instances Purchasing Options

On-Demand Instances

Reserved Instance

Savings Plans

Spot Instances

Dedicated Hosts

Dedicated Instances

Capacity Reservations

Spot Fleets

AMI

EC2 Image Builder

EC2 Instance Store

Placement Groups

Cluster

Spread

Partition

Rules and Limitations

Network Interfaces (ENI, ENA, EFA)

ENI - Elastic Network Interface

ENA - Elastic Network Adapter

EFA - Elastic Fabric Adapter

Public, Private and Elastic IP addresses

AWS Nitro System

AWS Nitro Enclaves

EBS

Resizing EBS Volumes

EBS Snapshots

EBS Volume Types

EBS vs Instance Store

EBS Multi-Attach – io1/io2 family

Copying and Sharing AMIs and Snapshots

Amazon Data Lifecycle Manager(DLM)

EBS Encryption

Using RAID with EBS

EFS

EFS Performance

EFS Storage Classes

Mount Targets

Amazon FSx

ELB & ASG

High Availability & Scalability

Scalability vs Elasticity vs Agility

Elastic Load Balancer

Application Load Balancer

Network Load Balancer

Gateway Load Balancer

Load Balancer - SSL Certificates

SSL – Server Name Indication (SNI)

Cross-Zone Load Balancing

Connection Draining

ELB Access Logs

Auto Scaling Group

Configuration of an Auto Scaling Group

Monitoring